Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University
|
|
- Cecily Lane
- 5 years ago
- Views:
Transcription
1 Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University
2 Contents Elementary concepts Regression Validation Hypotesis testing ANOVA PCA, PCR, PLS Clusters, SIMCA Design of Experiments [1] Wonnacott & Wonnacott: Introductory statistics, Wiley [2] Snedecor & Cochran: Statistical Methods, Iowa State Univ. Press [3] Otto, Chemometrics, Wiley
3 Hypotesis testing Inference method Confidence levels Descriptive statistics Hypotesis testing Predictive statistics
4 Hypotesis testing Steps involved Formulate a null hypotesis This is what you want to claim E.g., the sample is within tolerances Formulate an alternative hypotesis This is a complement to null hypotesis E.g., the sample is not within tolerances Calculate a characteristic number Compare with tabulated values Accept or reject the null hypotesis
5 Hypotesis testing Huge number of test exist Tests for mean Tests for distribution Tests for spread Tests for outliers Etc.
6 Hypotesis testing Test for the mean Double-sided t-test, x = x µ t = s P(X) n Acceptable No-no No-no x
7 Hypotesis testing Mean at a nominal value The ibuprofen concentration must be 400 mg per pill. Therefore 400 mg. Take 5 pills and measure the ibuprofen content. The results are 396, 388, 398, 382, 373 mg. Mean x = 387 mg, s = 10.3 mg. Calculate the critical number, t = 2.82 Degrees of freedom = n-1 = 4 Choose risk level: 5 % (95 % confidence) Read the table for Student s t-test at risk level because the risk 2.5 % at the low end and 2.5 % at the high end gives total risk of 5 %. The value in the table, 2.776, is smaller than the calculated one. Reject the null hypotesis. Accept the alternative hypotesis. We cannot guarantee at 95 % confidence level that the pills have the precribed amount of ibuprofen. =
8 Student s distribution D.f. Risk N = number of samples D.f. = degrees of freedom = N - 1 This table is one-sided. Therefore the total risk at level is 2.5 % % and confidence probability is 95 %.
9 Hypotesis testing Test for the mean One-sided t-test, x = x t = µ s P(X) n Acceptable No-no x
10 Hypotesis testing Mean below a nominal value The EU regulatory limit for nitrate in drinking water is 50 mg/l. Determinations from 4 parallel samples gave the results 51.0, 51.3, 51.6, 50.9 mg/l. Is this just random variation or is the observed level systematically above the prescribed limit? Mean 51.2 and st.dev mg/l. Null hypotesis: the level is not exceeded, x it is too high. Calculate t = Choose risk level: 5 %. D.f. = 4-1 = 3., alternative hypotesis: The tabulated value of t, 2.353, is smaller than the calculated one. The null hypotesis must be rejected. The concentration is too high.
11 Hypotesis testing Compare two means Compare two sets of parallel measurements from different samples. Do the two samples differ significantly? A two-sided test. t = x s d x n n n + n s d = 2 ( n1 1) s1 + ( n 1) s n + n D. f.= n1 + n2 2
12 Hypotesis testing Do two production batches differ? Quality control tests the day and night shifts at a refinery. The octan numbers of parallel measurements are (1: day) 94.92, 95.07, 94.96, 95.02, 94.99, 94.93; (2: nite) 95.03, 95.08, 94.98, 95.03, 95.01, Means: (1) 94.98; (2) St.dev.: (1) 0.057; (2) Weighted st.dev. = Student s t = d.f. = 10 Choose risk level 2.5 %, read column : t = Comparison: No, we cannot say that the two results differ. Therefore only random variations are observed.
13 Q 1 Hypotesis testing = Dixon s Q test for outliers Can be applied also for very few observations. Arrange your n observations in ascending order. Calculate the numbers Q 1 and Q n. Null hypotesis: not an outlier. Accepted if calculated Q less than tabulated. x x n x x ; Q 2 = xn x n xn x 1 1
14 Hypotesis testing Dixon s Q test for outliers Critical values of Q test at the 1 % risk level. Number of observations = n. n Q n Q
15 Hypotesis testing Dixon s test for outliers Personer i följande åldrar deltar i en bussresa till teater i Helsingfors: 6, 7, 5, 6, 7, 6, 103, 8, 7, 5. Order them: 5, 5, 6, 6, 6, 7, 7, 7, 8, 103. Q 1 = 0, 5 is not an outlier; Q 2 = 0.969, 103 certainly is an outlier.
16 Hypotesis testing Grubb s test for outliers Observation x* is not an outlier in a series if T = x s x * < T Tabulated
17 Hypotesis testing Grubb s outlier test Critical values for Grubb s outlier test at 95 % and 99 % levels. Number of observ ations = n. n T(95%) T(99%) n T(95%) T(99%)
18 Hypotesis testing Outliers in linear regression In order to find whether or not observation k (value y k ) is an outlier 1) Calculate a new regression with observation k removed. 2) Calculate e k = y k obs - y k calc.
19
20 ANOVA Analysis of variance Used to test interdependences between batches. Used as an analysis tool for designed experiments. Requires several parallel measurements (replicates) of each batch (or experiment).
21 Anova One-way analysis Assume that four different samples are taken from waste water of a factory to study the potassium concentration (mg/l). Each sample is analysed by a different crues. Three parallel measurements are made to determine the concentration of each sample. Replicate Batch Mean
22 Anova Variation between samples Replicate Batch Mean y total = SSQ fact = 0.489
23 Anova Variation between samples Replicate Batch Mean y total = SSQ fact = SSQ = n ( y y ) 2 fact j j total j= 1
24 Anova Variation within samples Replicate Batch Mean y total = SSQ fact = SSQ R = (y ij - y j) 2
25 Anova Variation within samples Replicate Batch Mean y total = SSQ fact = (y ij - y j) 2 SSQ R = n SSQ = ( y y ) 2 j R ij j j= 1 i= 1
26 Anova Total variation Replicate Batch Mean y total = SSQ fact = SSQ R = SSQ corr = 0.749
27 Anova Total variation Replicate Batch Mean y total = SSQ fact = SSQ R = SSQ corr = q n SSQ = ( y y ) 2 j corr ij total j= 1 i= 1
28 Anova Total variation broken down to contributions Replicate Batch Mean y total = SSQ fact = SSQ R = SSQ corr = SSQ corr = SSQ fact + SSQ R
29
30 PCA Principal component analysis PCA finds a direction along which the points lie. X = Ca ph Features Object!!!
31 PCA What does it mean? y=ph Principal component (1 1) x=ca
32 PCA What is it? PCA classifies the observations. It does not perform any regression. X = 2 " $ $ $ # Low % ' ' ' & Medium High
33 PCA What does it mean? y=ph Principal component (1 1) x=ca
34 PCA Next principal component The next principal direction with the next largest spread must be orthogonal to the first one.
35 PCA The second principal component y=ph Principal component (1-1) Principal component (1 1) x=ca
36 ( PCA How is it done? Direction of largest spread needs to be found. Spread along the coordinate axes is given by the variance-covariance matrix. Its eigenvalue gives the characteristic spread. The corresponding eigenvector gives the direction. Eigenvectors are automatically orthogonal. So, diagonalize the Only, you don t. matrix.
37 PCA How is it done? Diagonalization gives ALL eigenvalues. You only need a few largest. Use special mathematical techniques instead.
38 PCA Eigenvalues The spread of the first component is largest, that of the second smaller etc. Two or three components usually explain all the spread down to experimental errors. Eigenvalue = Spread These do not differentiate the observations Component
39 PCA How is it done, then? Break down the observations X to a product of a scores matrix T and a loadings matrix L. X = 2 ) * ,... - T = 2 / Scores L T = ( 1 1) Loadings X = T L T Compare: y = ax This example is mathematically inconsistent!
40 PCA Loadings The loadings matrix tells what is the direction of the principal component. y=ph Principal component, L T = (1 1) x=ca
41 PCA Scores The scores matrix tells where the points lie along the new coordinate axis. y=ph x=ca
42 PCA A real case Hairs samples from a crime site were analyzed. The following elemental compositions of the hairs of the suspects were detected. Hair Cu Mn Cl Br I
43 PCA Scores Consider two principal components: PC PC1
44 PCA Loadings Loadings tell how much the original variables contribute to the principal component: PC2 I Mn 0 Cu Mn Br I Cl Cu Cl 0 PC1 0 Br PC1
45 PCR Principal component regression (Multivariate) linear regression along the principal components. Only one (or a few) variable(s). Maximal resolving power.
46 5 PLS Partial least squares Linear regression: y = x a Multivariate regression: y = X a y = X a PLS: y = U Q y = U Q
47
48 Cluster analysis Cluster analysis finds observations that are more similar to each other than to observations outside the cluster.
49 Cluster analysis Distance Cluster analysis is based on distance (or similarity) between objects. City-block distance Euklidian distance Pearson-distance Mahalonobis distance...
50 Cluster analysis City-block distance Feature 2 x 22 d 12 = x 11 -x 21 + x 12 -x 22 x 12 -x 22 x 12 x 11 -x 21 x 11 x 21 Feature 1
51 Cluster analysis Euklidian distance Feature 2 x 22 d 12 = [(x 11 -x 21 ) 2 + (x 12 -x 22 ) 2 ] 1/2 x 12 x 11 x 21 Feature 1
52 Cluster analysis Pearson-distance d ij = K k = 1 ( x x ) ik s 2 j jk 2
53 Cluster analysis Example data Concentrations of calcium and phosphate in six blood serum samples (mg per 100 ml). Object Features Calcium Phosphate d 12 = [( ) 2 + ( ) 2 ] 1/2 = 0.354
54 Cluster analysis Distance matrix Object Smallest distance *
55 Cluster analysis Second distance matrix Object 1* * * 4*
56 Cluster analysis Third distance matrix Object 1* 3 4* 5 1* * * 4* 5*
57 Cluster analysis Fourth distance matrix Object 1* 3 5* 1* * * 1* 4* 5*
Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Hypothesis testing Inference method Confidence levels Descriptive statistics Hypotesis testing Predictive statistics Hypothesis testing
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationExperimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University
Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University Contents Elementary concepts Regression Validation Design of Experiments Definitions Random sampling Factorial
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationComparing Several Means: ANOVA
Comparing Several Means: ANOVA Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one way independent ANOVA Following up an ANOVA: Planned contrasts/comparisons Choosing
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationStatistics: Error (Chpt. 5)
Statistics: Error (Chpt. 5) Always some amount of error in every analysis (How much can you tolerate?) We examine error in our measurements to know reliably that a given amount of analyte is in the sample
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationBasic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).
Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.
More informationNoise & Data Reduction
Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit
More informationStatistical Analysis of Chemical Data Chapter 4
Statistical Analysis of Chemical Data Chapter 4 Random errors arise from limitations on our ability to make physical measurements and on natural fluctuations Random errors arise from limitations on our
More informationSection 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples
Objective Section 9.4 Inferences About Two Means (Matched Pairs) Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationTAMS39 Lecture 10 Principal Component Analysis Factor Analysis
TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More informationNoise & Data Reduction
Noise & Data Reduction Andreas Wichert - Teóricas andreas.wichert@inesc-id.pt 1 Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationMultivariate Statistics (I) 2. Principal Component Analysis (PCA)
Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationsphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19
additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationPrincipal component analysis, PCA
CHEM-E3205 Bioprocess Optimization and Simulation Principal component analysis, PCA Tero Eerikäinen Room D416d tero.eerikainen@aalto.fi Data Process or system measurements New information from the gathered
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationExplaining Correlations by Plotting Orthogonal Contrasts
Explaining Correlations by Plotting Orthogonal Contrasts Øyvind Langsrud MATFORSK, Norwegian Food Research Institute. www.matforsk.no/ola/ To appear in The American Statistician www.amstat.org/publications/tas/
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationMath 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010
Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Instructor Name Time Limit: 120 minutes Any calculator is okay. Necessary tables and formulas are attached to the back of the exam.
More informationCOMPARING SEVERAL MEANS: ANOVA
LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons
More informationMEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999
MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS 350 - FALL 1999 Instructor: A. Oyet Date: December 16, 1999 Name(Surname First): Student Number INSTRUCTIONS
More informationR = µ + Bf Arbitrage Pricing Model, APM
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationOne-way ANOVA. Experimental Design. One-way ANOVA
Method to compare more than two samples simultaneously without inflating Type I Error rate (α) Simplicity Few assumptions Adequate for highly complex hypothesis testing 09/30/12 1 Outline of this class
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Principal Analysis Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board
More informationClusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse
More informationUsing SPSS for One Way Analysis of Variance
Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial
More informationMULTIVARIATE ANALYSIS OF VARIANCE
MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,
More informationAnalysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร
Analysis of Variance ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร pawin@econ.tu.ac.th Outline Introduction One Factor Analysis of Variance Two Factor Analysis of Variance ANCOVA MANOVA Introduction
More informationMultivariate Fundamentals: Rotation. Exploratory Factor Analysis
Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2
More informationPractical Statistics for the Analytical Scientist Table of Contents
Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationStatistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. 1 The Challenge The quality of an item or service usually depends on more than one characteristic.
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationEDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS
EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that
More informationTAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12
Kurskod: TAMS38 - Provkod: TEN1 TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12 The collection of the formulas in mathematical statistics prepared by Department
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationG E INTERACTION USING JMP: AN OVERVIEW
G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects
More information4.1 Hypothesis Testing
4.1 Hypothesis Testing z-test for a single value double-sided and single-sided z-test for one average z-test for two averages double-sided and single-sided t-test for one average the F-parameter and F-table
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More information7. Variable extraction and dimensionality reduction
7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More information2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006
and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationChapter 5 Introduction to Factorial Designs Solutions
Solutions from Montgomery, D. C. (1) Design and Analysis of Experiments, Wiley, NY Chapter 5 Introduction to Factorial Designs Solutions 5.1. The following output was obtained from a computer program that
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationPrincipal Component Analysis (PCA) Principal Component Analysis (PCA)
Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationRoss (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationGroup comparison test for independent samples
Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationAnalysis of Variance (ANOVA)
Analysis of Variance ANOVA) Compare several means Radu Trîmbiţaş 1 Analysis of Variance for a One-Way Layout 1.1 One-way ANOVA Analysis of Variance for a One-Way Layout procedure for one-way layout Suppose
More informationStat 217 Final Exam. Name: May 1, 2002
Stat 217 Final Exam Name: May 1, 2002 Problem 1. Three brands of batteries are under study. It is suspected that the lives (in weeks) of the three brands are different. Five batteries of each brand are
More information