Data Screening and Adjustments. Data Screening for Errors
|
|
- Christiana Stanley
- 6 years ago
- Views:
Transcription
1 Purpose: ata Screening and djustments P etect and correct data errors P etect and treat missing data P etect and handle insufficiently sampled variables (e.g., rare species) P onduct transformations and standardizations P etect and handle outliers ata Screening for rrors P xamine summary statistics (e.g., n, mean, min, max) and check for irregularities Where did all the data go? Unrealistic value? ction: correct errors in the raw data
2 ata Screening for Missing ata P valuate amount and pattern of missing data and take corrective action, if needed: e.g., Median replacement ction: replace with prior knowledge; insert means or medians; use regression to estimate values ata Screening for Sufficiency P heck for and drop insufficient variables <.g., rare species in community datasets Sufficiency is the extent to which each variable, e.g., each species ecological character, is accurately and meaningfully described by the data..g., species with very few records are not likely to be accurately placed in ecological space. You must decide at what level of frequency of occurrence you want to accept the message and eliminate species below this level.
3 ata Screening for Sufficiency P Other issues: < Influence of abundant generalists in community datasets bundant generalists define strong dimensions of the data cloud that have no meaningful pattern on them. They can overwhelm the message of rarer species in some types of analysis. You must decide whether to include or exclude these dominant species. < Variables with too little variation (i.e., no signature) Variables with too little variation have no meaningful pattern (or influence) and are therefore unnecessary. ata Screening for Sufficiency Typical community dataset ominant species 9% occurrence Rare species Median occurrence % occurrence
4 ata Screening for Sufficiency Some Rules of Thumb P rop insufficient variables (species) and conduct sensitivity analysis < Rare species (e.g., <% occurrence) < Too little variability (e.g., <-% V) Too few occurrences? 7 Too little variation? ata Screening for Sufficiency Some Rules of Thumb P rop abundant generalist species and conduct sensitivity analysis < ominant species (e.g., >9% occurrence) Too ubiquitous? 8
5 ata Transformations & Standardizations Purpose: P Statistical < Improve assumptions of normality, linearity, homogeneity of variance, etc. < Make units of variables comparable when measured on different scales. P cological < Make ecological distance measures work better. < Reduce effect of total quantity in sample units, to put focus on relative quantities. < qualize (or otherwise alter) the relative importance of variables (e.g., common and rare species). < mphasize informative variables (species) at the expense of uninformative variables (species). 9 ata Transformations & Standardizations F Log Transformation b ij =log(x ij +) F olumn Z-score Standardization b ij =(x ij - j )/s j F Transformations are applied to each element of the data matrix, independent of the other elements. Standardizations adjust matrix elements by a row or column standard (e.g., max, sum, etc.).
6 Monotonic Transformations When to Transform? P To adjust for highly skewed variables P To better meet assumptions of statistical test (e.g., normality, constant variance, etc.) P To emphasize presence/absence (nonquantitative) signature Which Transformation? P epends on type of data P Whichever works best Monotonic Transformations F 7 b ij =x ij (power) F inary presence/absence Transformation b ij =x ij (power) cceptable omain of x: ll Range of f(x): and only P onverts quantitative data into nonquantitative data P pplicable for species data P Most useful when there is little quantitative information present P an be a severe transformation
7 Monotonic Transformations F b ij =log(x ij +) ? F Log Transformation b ij =log(x ij +) cceptable omain of x: > Range of f(x): ll P ompresses high values and spreads low values by expressing values as orders of magnitude P Useful when high degree of variation; ratio of largest to smallest >; highly positively skewed data Monotonic Transformations Log Transformation b ij =log(x ij +) T?
8 Monotonic Transformations F b ij =x ij ½ (power) F Square Root Transformation b ij =x ij ½ (power) cceptable omain of x: $ Range of f(x): $ P Similar in effect to, but less dramatic than, the log transformation P Often used with count (meristic) data; e.g., when mean equals the variance (Poisson distribution) Monotonic Transformations b 8 Power Transformations p=/ p=/ p=/ p=/ p=/ 7 x 8 9 Power Family Transformation b ij =x ij /p cceptable omain of x: $ Range of f(x): $ P ifferent exponents change the effect of the transformation; the smaller the exponent, the more compression applied to high values P Flexible transformation useful for a wide variety of data
9 Monotonic Transformations Power Family Transformation b ij =x ij /p 7 Monotonic Transformations F b ij =(/π)*sin - (x ij½ ) F rcsin Square Root Transformation b ij =(/π)*sin - (x ij½ ) cceptable omain of x: - Range of f(x): - P Spreads end of the scale while compressing the middle for proportion data P Useful for proportion data with positive skew (can use arcsine transformation for negative skew) 8
10 Monotonic Transformations rcsin Square Root Transformation b ij =(/π)*sin - (x ij½ ) T? 9 Monotonic Transformations Some Rules of Thumb P Use a log or square root transformation for highly skewed data or ranging over several (>) orders of magnitude P Use arcsine squareroot transformation for proportion data P If applied to related variable set (e.g., species), then use same transformation (e.g., log) so that all are scaled the same; otherwise, transform independently P onsider binary (presence/absence) transformation when: < percent zeros high (say >%) < number of distinct values low (say < ) < eta diversity high (say >) S s
11 F b ij =x ij / max(x i ) Standardizations F When to Standardize? P To place on equal footing highly unequal sample units or variables (species) P To better represent the patterns of interest Which Standardization? P epends on objective (sample or variable adjustment) and statistical technique (ordination, cluster, etc.)? P Which standard (variance, totals, max, etc.) makes sense? Standardizations F b ij =(x ij - j )/s j F b ij =(x ij - i )/s i F P Standardizations adjust matrix elements by a row or column standard (e.g., max, sum, etc.). P ll standardizations can be applied to either rows or columns (or both)
12 olumn or Row Standardizations? F 7 olumn Standardization P When the principal concern is to adjust for differences (e.g., variances, total abundance, ubiquity) among variables (species) in order to place them on equal footing. P When the focus is on the profile across sample units. Row Standardization P When the principal concern is to adjust for differences (e.g., total abundance, diversity) among sample units in order to place them on equal footing. P When the focus is on the profile within a sample unit. ommon Standardizations P...divide by margin total P Max...divide by margin maximum P Range...standardize values to range - P Frequency...divide by margin maximum and multiply by number of non-zero items, so that the average of nonzero items is P Hellinger...square root of method=total P Normalization...make margin sums of squares equal P Standardize...scale to zero mean and unit variance (zscores) P hi.square...divide by row sums and square root of column sums, and adjust for square root of matrix total
13 Standardizations F b ij =(x ij - j )/s j F olumn Z-score Standardization b ij =(x ij - j )/s j cceptable omain of x: ll Range of f(x): ll P onverts data to z-scores (mean=, variance=) P ommonly used to place variables on equal footing P ssential when variables have different scales or units of measurement Standardizations F b ij =x ij / x j F olumn Standardization b ij =x ij / x j cceptable omain of x: $ Range of f(x): - P ommonly used with species data to adjust for unequal abundances among species P qualizes areas under curves of species response profiles P Relative abundance profiles of samples depends on species relative abundances across all sites
14 Standardizations F b ij =x ij / max(x j ) F olumn Max Standardization b ij =x ij / max(x j ) cceptable omain of x: $ Range of f(x): - P Similar to column total, except: P qualizes heights of peaks of species response curves P ased on extreme values which can introduce noise P an exacerbate importance of rare species 7 Standardizations qualizes area under curve Frequency.... olumn Standardization Frequency Species Species bundance (count) olumn Max Standardization bundance (count) qualizes peaks of curves Frequency bundance (count) 8
15 Standardizations F b ij =x ij / x i F.. Row Standardization b ij =x ij / x i cceptable omain of x: $ Range of f(x): - P ommonly used with species data to adjust for unequal abundances among sample units P qualizes areas under curves of sample unit profiles P Shifts emphasis to relative abundance within a sample unit P Relative abundance profiles of samples are independent 9 Standardizations F b ij =x ij / max(x i ) F Row Max Standardization b ij =x ij / max(x i ) cceptable omain of x: $ Range of f(x): - P Similar to row total; except: P qualizes heights of peaks of sample unit profiles P ased on extreme values which can introduce noise
16 Standardizations F 7 F b.... ij =col max F b ij =row..total Wisconsin ouble Standardization cceptable omain of x: $ Range of f(x): - P st standardize by species (col) maxima, then by row totals P qualize emphasis among sample units and among species P ppealing, but comes at cost of diminishing the intuitive meaning for individual data values Standardizations Some Rules of Thumb P ffect of standardization on analysis depends on variability among rows and/or columns
17 Standardizations Some Rules of Thumb F 7 P onsider row standardizations for species data sets, commonly: < Row normalize (uclidean distance () = chord distance) (Legendre and Gallagher ) F.. < Row chi.square ( = chi.square distance of /) < Row total ( = species profile distance) < Row hellinger ( = Hellinger distance) Standardizations Some Rules of Thumb F 7 b ij =(x ij - j )/s j F P onsider column standardizations to equalize variables measured in different units and scales, commonly: < olumn standardize (z-scores = zero mean and unit variance) < olumn normalize (uncentered with unit variance) < olumn total (col sums = ) < olumn range (col range -)
18 Standardizations Some Rules of Thumb P Standardizations may not matter depending on subsequent analysis, e.g.,: < Principal components of correlation matrix has built in column standardization < orrespondence analysis of species data set has essentially a built in chi-square standardization P No theoretical basis for selecting the best standardization - should justify on biological grounds and perhaps conduct sensitivity analysis ata Screening for Outliers P What are outliers? < Sample units with extreme values for individual variables (univariate outliers) or sample units with unusual combination of values for more than one variable (mulitvariate outliers). P Why worry about outliers? < Outliers can have a large effect on the outcome of an analysis and therefore can lead to erroneous conclusions.
19 ata Screening for Outliers P Univariate outliers: < xamine sample standard deviation scores on each variable separately. Standard deviation scores > MGO MRO H 8..9 N N 8. N N N 8.7. N.9 8. N N N 8 N N N. 87 N N N N 89 N N N. 9 N N N N 9 N N.7 N xtreme observations 7 ata Screening for Outliers P Multivariate outliers: < xamine deviations of the sample average distances to other samples. Standard deviation scores > xtreme observations 8
20 ata Screening for Outliers P Multivariate outliers: < xamine each sample s Mahalanobis distance to the group of remaining samples. 9 ata Screening for Outliers P Multivariate outliers: < xamine results of subsequent analyses for extreme values (e.g., isolated points in ordination plots, single-member clusters in cluster analysis, etc.) P P P
21 ata Screening for Outliers Some Rules of Thumb P xamine data at all stages of analysis (i.e., input data, transformed/standardized data, ecological distance matrix, results of analysis) for extreme values P e aware of potential impact of extreme values in chosen analysis P elete extreme values only if justifiable on ecological grounds P onduct sensitivity analysis
Algebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationANCOVA. Lecture 9 Andrew Ainsworth
ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the
More informationAlgebra 1 Mathematics: to Hoover City Schools
Jump to Scope and Sequence Map Units of Study Correlation of Standards Special Notes Scope and Sequence Map Conceptual Categories, Domains, Content Clusters, & Standard Numbers NUMBER AND QUANTITY (N)
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationAssessing the relation between language comprehension and performance in general chemistry. Appendices
Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationDistance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures
Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance
More informationMath 361. Day 3 Traffic Fatalities Inv. A Random Babies Inv. B
Math 361 Day 3 Traffic Fatalities Inv. A Random Babies Inv. B Last Time Did traffic fatalities decrease after the Federal Speed Limit Law? we found the percent change in fatalities dropped by 17.14% after
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Boxplots and standard deviations Suhasini Subba Rao Review of previous lecture In the previous lecture
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More informationPrentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)
National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More information4.1. Introduction: Comparing Means
4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationQuantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)
Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.1.1 Simple Interest 0.2 Business Applications (III) 0.2.1 Expenses Involved in Buying a Car 0.2.2 Expenses Involved
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationCHAPTER 3. YAKUP ARI,Ph.D.(C)
CHAPTER 3 YAKUP ARI,Ph.D.(C) math.stat.yeditepe@gmail.com REMEMBER!!! The purpose of descriptive statistics is to summarize and organize a set of scores. One of methods of descriptive statistics is to
More informationTHE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook
BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationPCA Advanced Examples & Applications
PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationChapter 1 - Lecture 3 Measures of Location
Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More information8. FROM CLASSICAL TO CANONICAL ORDINATION
Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical
More informationREVIEW: Midterm Exam. Spring 2012
REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic
More informationY i = η + ɛ i, i = 1,...,n.
Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationChapter 2. Mean and Standard Deviation
Chapter 2. Mean and Standard Deviation The median is known as a measure of location; that is, it tells us where the data are. As stated in, we do not need to know all the exact values to calculate the
More informationEquations and Inequalities
Algebra I SOL Expanded Test Blueprint Summary Table Blue Hyperlinks link to Understanding the Standards and Essential Knowledge, Skills, and Processes Reporting Category Algebra I Standards of Learning
More informationDiversity partitioning without statistical independence of alpha and beta
1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3
More informationChapter 2 Exploratory Data Analysis
Chapter 2 Exploratory Data Analysis 2.1 Objectives Nowadays, most ecological research is done with hypothesis testing and modelling in mind. However, Exploratory Data Analysis (EDA), which uses visualization
More informationDissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal
and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function
More information1.3: Describing Quantitative Data with Numbers
1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationStandards for Mathematical Objectives Major & Minor
Standards for Mathematical Objectives Major & Minor Practice Assessments 1) Make sense of problems and determine if a situation should be modeled by a one or two Mini Quiz 1.1 persevere in solving them.
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationΣ x i. Sigma Notation
Sigma Notation The mathematical notation that is used most often in the formulation of statistics is the summation notation The uppercase Greek letter Σ (sigma) is used as shorthand, as a way to indicate
More informationTopic 8. Data Transformations [ST&D section 9.16]
Topic 8. Data Transformations [ST&D section 9.16] 8.1 The assumptions of ANOVA For ANOVA, the linear model for the RCBD is: Y ij = µ + τ i + β j + ε ij There are four key assumptions implicit in this model.
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationWhy is the field of statistics still an active one?
Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationIntroduction to multivariate analysis Outline
Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence
More informationAlgebra I Number and Quantity The Real Number System (N-RN)
Number and Quantity The Real Number System (N-RN) Use properties of rational and irrational numbers N-RN.3 Explain why the sum or product of two rational numbers is rational; that the sum of a rational
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationCHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the
CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationOrdination & PCA. Ordination. Ordination
Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation
More informationCOMMON CORE STATE STANDARDS TO BOOK CORRELATION
COMMON CORE STATE STANDARDS TO BOOK CORRELATION Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in subsequent activities,
More informationIn many situations, there is a non-parametric test that corresponds to the standard test, as described below:
There are many standard tests like the t-tests and analyses of variance that are commonly used. They rest on assumptions like normality, which can be hard to assess: for example, if you have small samples,
More informationPattern Structures 1
Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects
More informationNoise & Data Reduction
Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit
More informationDescriptive Statistics
*following creates z scores for the ydacl statedp traitdp and rads vars. *specifically adding the /SAVE subcommand to descriptives will create z. *scores for whatever variables are in the command. DESCRIPTIVES
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationSchool District of Marshfield Course Syllabus
School District of Marshfield Course Syllabus Course Name: Algebra I Length of Course: 1 Year Credit: 1 Program Goal(s): The School District of Marshfield Mathematics Program will prepare students for
More informationTennessee s State Mathematics Standards - Algebra I
Domain Cluster Standards Scope and Clarifications Number and Quantity Quantities The Real (N Q) Number System (N-RN) Use properties of rational and irrational numbers Reason quantitatively and use units
More informationAlgebra Topic Alignment
Preliminary Topics Absolute Value 9N2 Compare, order and determine equivalent forms for rational and irrational numbers. Factoring Numbers 9N4 Demonstrate fluency in computations using real numbers. Fractions
More informationStatistical Concepts. Constructing a Trend Plot
Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable
More information2 Naïve Methods. 2.1 Complete or available case analysis
2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationSPECIES RESPONSE CURVES! Steven M. Holland" Department of Geology, University of Georgia, Athens, GA " !!!! June 2014!
SPECIES RESPONSE CURVES Steven M. Holland" Department of Geology, University of Georgia, Athens, GA 30602-2501" June 2014 Introduction Species live on environmental gradients, and we often would like to
More informationPacing (based on a 45- minute class period) Days: 17 days
Days: 17 days Math Algebra 1 SpringBoard Unit 1: Equations and Inequalities Essential Question: How can you represent patterns from everyday life by using tables, expressions, and graphs? How can you write
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationResistant Measure - A statistic that is not affected very much by extreme observations.
Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationAdvanced Quantitative Data Analysis
Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose
More informationDRAFT EAST POINSETT CO. SCHOOL DIST. - ALGEBRA I MATH
Module 1 - Math Test: 10/15/2015 Interpret the structure of expressions. AI.A.SSE.1 * Interpret expressions that represent a quantity in terms of its context. [Focus on linear, exponential, and quadratic
More informationMathematics. Number and Quantity The Real Number System
Number and Quantity The Real Number System Extend the properties of exponents to rational exponents. 1. Explain how the definition of the meaning of rational exponents follows from extending the properties
More information3 GRAPHICAL DISPLAYS OF DATA
some without indicating nonnormality. If a sample of 30 observations contains 4 outliers, two of which are extreme, would it be reasonable to assume the population from which the data were collected has
More informationGuide Assessment Structure Algebra I
Guide Assessment Structure Algebra I The Common Core State Standards for Mathematics are organized into Content Standards which define what students should understand and be able to do. Related standards
More informationMATRICES. a m,1 a m,n A =
MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of
More informationCHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
More informationMathematics Standards for High School Algebra I
Mathematics Standards for High School Algebra I Algebra I is a course required for graduation and course is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationUNIT 3 CONCEPT OF DISPERSION
UNIT 3 CONCEPT OF DISPERSION Structure 3.0 Introduction 3.1 Objectives 3.2 Concept of Dispersion 3.2.1 Functions of Dispersion 3.2.2 Measures of Dispersion 3.2.3 Meaning of Dispersion 3.2.4 Absolute Dispersion
More informationObservations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra
September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing
More informationFLORIDA STANDARDS TO BOOK CORRELATION
FLORIDA STANDARDS TO BOOK CORRELATION Florida Standards (MAFS.912) Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationTopic 1. Definitions
S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More information