Statistics Toolbox 6. Apply statistical algorithms and probability models

Similar documents
Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Experimental Design and Data Analysis for Biologists

Practical Statistics for the Analytical Scientist Table of Contents

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Course in Data Science

Contents. Acknowledgments. xix

Subject CS1 Actuarial Statistics 1 Core Principles

Types of Statistical Tests DR. MIKE MARRAPODI

Appendix A Summary of Tasks. Appendix Table of Contents

Textbook Examples of. SPSS Procedure

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

STATISTICS-STAT (STAT)

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

HANDBOOK OF APPLICABLE MATHEMATICS

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

IDL Advanced Math & Stats Module

Exam details. Final Review Session. Things to Review

Glossary for the Triola Statistics Series

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Workshop Research Methods and Statistical Analysis

3 Joint Distributions 71

Multivariate Analysis of Ecological Data using CANOCO

Statistical. Psychology

Chart types and when to use them

Introduction to Basic Statistics Version 2

1 Introduction to Minitab

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Review of Statistics 101

Turning a research question into a statistical question.

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Bivariate Relationships Between Variables

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Sensitivity Analysis with Correlated Variables

Introduction to Statistical Analysis

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

Statistics and Measurement Concepts with OpenStat

Principal Component Analysis, A Powerful Scoring Technique

Basic Statistical Analysis

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Contents. Part I: Fundamentals of Bayesian Inference 1

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Intuitive Biostatistics: Choosing a statistical test

Rank-Based Methods. Lukas Meier

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Algebra I. Standard Processes or Content Strand CLE Course Level Expectation SPI State Performance Indicator Check for Understanding

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Basics of Multivariate Modelling and Data Analysis

Robustness and Distribution Assumptions

Basic Statistical Tools

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

A User's Guide To Principal Components

SPSS LAB FILE 1

Descriptive Data Summarization

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Learning Objectives for Stat 225

What is a Hypothesis?

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

DISCOVERING STATISTICS USING R

Finding Relationships Among Variables

Algebra 2. Curriculum (524 topics additional topics)

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

Pattern Recognition and Machine Learning

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Similarity and Dissimilarity

Testing Statistical Hypotheses

Multilevel Statistical Models: 3 rd edition, 2003 Contents

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Regression Analysis. BUS 735: Business Decision Making and Research

BNG 495 Capstone Design. Descriptive Statistics

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Background to Statistics

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

PATTERN CLASSIFICATION

Chapter 8 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Unsupervised Learning Methods

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Stat 5101 Lecture Notes

The science of learning from data.

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Unconstrained Ordination

Practical Statistics

Introduction to statistical modeling

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Mathematics. Pre Algebra

Transcription:

Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of tools to assess and understand their data. It includes functions and interactive tools for analyzing historical data, modeling data, simulating systems, developing statistical algorithms, and learning and teaching statistics. The toolbox supports a wide range of tasks, from basic descriptive statistics to developing and visualizing multidimensional nonlinear models. It offers a rich set of statistical plot types and interactive graphics, such as polynomial fitting and response surface modeling. All toolbox functions are written in the open MATLAB language. This means that you can inspect the algorithms, modify the source code, and create your own custom functions. Key features Data organization and management Descriptive statistics Statistical plotting and data visualization Probability distributions Analysis of variance (ANOVA) Linear and nonlinear modeling Multivariate statistics Design of Experiments (DOE) Hypothesis testing Statistical Process Control (SPC) Fitting univariate distributions to data. The Distribution Fitting Tool lets you easily import, analyze, and plot your data. Plots showing how data can be clustered into groups with similar characteristics. Statistics Toolbox includes functions for multivariate analysis and clustering. Accelerating the pace of engineering and science

Data Organization and Management Statistics Toolbox provides two specialized array types categorical arrays and dataset arrays that enhance MATLAB standard data types by enabling convenient organization and analysis of statistical data. Categorical Arrays Categorical arrays let you organize and process categorical data that takes on values from a finite set of discrete levels or categories. With categorical arrays, you can: Store nominal data using descriptive labels, such as red, green, and blue for an unordered set of colors Store ordinal data using descriptive labels, such as cold, warm, and hot for an ordered set of temperature measurements Manipulate categorical data using familiar array operations and indexing methods Index into other variables or create subsets of data based upon the category of observation Group observations of the same category for computing statistics and creating visualizations Dataset Arrays Dataset arrays enable convenient organization and analysis of heterogeneous statistical data and metadata. Dataset arrays have columns that represent different measured variables and rows that represent different observations. With dataset arrays, you can: Collect variables of different data types and sizes in a single array Use metadata to describe variables and observations, and to access them by name View summary statistics and display data in an intuitive tabular format Create, manage, and operate on dataset arrays using a variety of supporting methods Descriptive Statistics Descriptive statistics methods enable you to quickly understand and describe potentially large sets of data. Statistics Toolbox includes functions for calculating: Measures of central tendency (measures of location), including average, median, and various means Measures of dispersion (measures of spread), including range, variance, standard deviation, and mean or median absolute deviation Linear and rank correlation (partial and full) Results based on data with missing values Percentile and quartile estimates Bootstrap statistics Regression analysis to determine the most important ingredients for cement-mixture curing. Stepwise regression capabilities in Statistics Toolbox provide automated procedures for identifying models from several potential explanatory variables. Density estimates (using a kernel-smoothing function) These functions help you summarize the values in a data sample with a few highly relevant numbers. Statistical Plotting and Interactive Graphics Statistics Toolbox includes numerous functions that help you represent your data graphically. In addition to the standard set of MATLAB plot types, Statistics Toolbox includes box plots, probability plots, histograms and 3-D histograms, control charts, quantile-quantile plots, and several multivariate plots. It also provides interactive graphics that enhance analysis in areas such as: Nonlinear and polynomial fitting and prediction Exploration of distribution functions and distribution fitting and analysis Interactive random number generation Response surface modeling Interactive process experimentation and analysis Stepwise regression analysis

Analysis of covariance (ANOCOVA) tool, which plots data to assess group-togroup differences and the impact of a predictor variable on a response variable. Probability Distributions Statistics Toolbox includes an extensive library of probability distribution functions that let you fit probability distributions to your data, compute functions from them, and generate random samples. From the command line you have access to more than 150 different functions for: Calculating the probability density function (pdf) Calculating the cumulative distribution function (cdf) and its inverse Computing mean and variance Estimating distribution parameters Generating random numbers Statistics Toolbox includes three interactive graphical user interfaces (GUIs) that simplify common analysis tasks. The Distribution Fitting Tool GUI lets you fit data using 21 predefined probability distributions, a nonparametric (kernel-smoothing) estimator, or a custom distribution that you define yourself. It supports both complete and censored (reliability) data and lets you exclude data, save and load sessions, and generate M-code. The Distribution Tool GUI lets you learn about a variety of probability distributions and explore how parameters affect their position and shape. The Random Number Tool GUI provides a random number generator to simulate behavior associated with particular distributions. You can use this random data to test hypotheses or models under different conditions, as well as perform Monte-Carlo simulations. Statistics Toolbox also includes functions for generating random samples from multivariate distributions, such as t, normal, copulas, and Wishart; sampling from finite populations; performing Latin hypercube sampling; and generating samples from Pearson and Johnson systems of distributions. Analysis of Variance Analysis of variance (ANOVA) lets you determine whether data sets from different groups have different characteristics. You can classify groups using discrete predictor variables. A follow-up multiple comparisons analysis can pinpoint which pairs of groups differ from each other. Statistics Toolbox includes algorithms for ANOVA and related techniques, including: One-way ANOVA with graphics Two-way ANOVA for balanced data Multiway ANOVA for balanced and unbalanced data (both fixed and random effects) Analysis of covariance (ANOCOVA) One-way multivariate ANOVA Nonparametric one- and two-way ANOVA (Kruskal-Wallis, Friedman) Multiple comparison of group means, slopes, and intercepts

Matrix of scatter plots and histograms comparing automobile performance over three model years (left). Statistics Toolbox makes it easy to plot multiple variables and compare data. Parallel coordinates plot of multivariate data describing throttle performance (right). Statistics Toolbox provides convenient tools for visualizing high-dimensional data. Linear and Nonlinear Modeling The linear and nonlinear models provided in Statistics Toolbox let you model a response variable as a function of one or more predictor variables. These models make predictions, establish relationships between variables, or simplify a problem. For example, linear and nonlinear regression models help establish which variables have the most impact on a response. Robust regression methods can help you find outliers and reduce their effect on the fitted model. The toolbox provides linear algorithms for: One-way, two-way, and multiway ANOVA Mixed random and fixed-effects ANOVA Polynomial, stepwise, ridge, robust, and multiple linear regression Generalized linear models, including multinomial (discrete-choice) models Response surface fitting The toolbox provides nonlinear fitting functions for classification and regression trees and for nonlinear least squares. Using nonlinear least squares functions, you can: Estimate parameters Interactively visualize and predict multidimensional nonlinear fitting Set confidence intervals for parameters and predicted values You can also use the toolbox to work with Hidden Markov models. You can estimate the parameters of a model using the Baum- Welch algorithm, calculate the most likely path through a model using the Viterbi algorithm, and generate random sequences from a given model. Multivariate Statistics Multivariate statistics methods let you analyze your data by evaluating groups of variables together. You can: Segment data in clusters for further analysis Visualize and assess the group-to-group differences in a data set Reduce a large set of variables to a more manageable but still representative set Multivariate statistics tasks supported by Statistics Toolbox include: Factor analysis Principal components analysis (PCA) Factor rotation and biplots Cluster analysis (both hierarchical and k- means) Discriminant analysis Multivariate ANOVA Multidimensional scaling (classical, metric, and nonmetric) Multivariate plotting (parallel coordinates, glyph plots, and Andrews plots) www.mathworks.com

Design of Experiments Functions for Design of Experiments (DOE) help you create and test practical plans to gather data for statistical modeling. These plans show you how to manipulate your data inputs in tandem to generate information about their effect on the outputs. Supported design types include: Full factorial Fractional factorial Response surface (central composite and Box-Behnken) D-optimal Latin hypercube You can use the toolbox to define, analyze, and visualize a customized DOE. For example, you can estimate input effects and input interactions using ANOVA, linear regression, and response surface modeling, and visualize results through main effect plots, interaction plots, and multi-vari charts. Hypothesis Testing Random variation often makes it difficult to determine whether samples taken under different conditions really are different. Hypothesis testing is an effective tool for analyzing whether sample-to-sample differences are significant and require further evaluation or are consistent with random and expected data variation. Statistics Toolbox supports the most widely used parametric and nonparametric hypothesis testing procedures, such as: One- and two-sample t tests One-sample z test Nonparametric tests for one sample, paired samples, and two independent samples Distribution tests (Chi-square, Jarque-Bera, Lilliefors, and Kolmogorov-Smirnov) Comparison of distributions (two-sample Kolmogorov-Smirnov) Autocorrelation and randomness tests Linear hypotheses tests on regression coefficients Model of a chemical reaction of an experiment using Design-of-Experiments (DOE) and surfacefitting capabilities of Statistics Toolbox (above). Fitting a decision tree to data (left). The fitting capabilities in Statistics Toolbox let you visualize a decision tree by drawing a diagram of the decision rule and group assignments.

Control charts showing process data and violations to Western Electric control rules. Statistics Toolbox provides a variety of control charts and control rules for monitoring and evaluating products or processes. Statistical Process Control Statistics Toolbox provides a set of functions that support Statistical Process Control (SPC). These functions enable you to monitor and improve products or processes by evaluating process variability. SPC functions let you: Perform gage repeatability and reproducibility studies Estimate process capability Create eleven different control charts Apply Western Electric and Nelson control rules to control chart data Required Products MATLAB Related Products Bioinformatics Toolbox. Read, analyze, and visualize genomic, proteomic, and microarray data Curve Fitting Toolbox. Perform model fitting and analysis Financial Toolbox. Analyze financial data and develop financial algorithms Optimization Toolbox. Solve standard and large-scale optimization problems Signal Processing Toolbox. Perform signal processing, analysis, and algorithm development For information on related products, visit www.mathworks.com/products/statistics Platform and System Requirements For information on platform and system requirements, visit www.mathworks.com/products/statistics Resources visit www.mathworks.com Technical Support www.mathworks.com/support Online User Community www.mathworks.com/matlabcentral Demos www.mathworks.com/demos Training Services www.mathworks.com/training Third-Party Products and Services www.mathworks.com/connections Worldwide CONTACTS www.mathworks.com/contact e-mail info@mathworks.com Accelerating the pace of engineering and science 2007 MATLAB, Simulink, Stateflow, Handle Graphics, Real-Time Workshop, and xpc TargetBox are registered trademarks and SimBiology, SimEvents, and SimHydraulics are trademarks of The MathWorks, Inc. Other product or brand names are trademarks or registered trademarks of their respective holders. 8079v07 03/07