Solution to Series 10
|
|
- Horace Elliott
- 5 years ago
- Views:
Transcription
1 Prof. Dr. M. Maathuis Multivariate Statistics SS 0 Solution to Series 0. a) > bumpus <- read.table(" skip=0, nrows=9, col.names=c("id","total","alar","head","humerus","sternum")) > bumpus <- bumpus[,-] b) The assumptions are simple random sample from each population in each population the variables are multivariate normal the two populations have the same covariance matrix c) H 0 : µ = µ H A : µ µ (under the assumption that Σ = Σ ) d) The T statistic is defined as T = n n n + n D with n = n + n and D = (x x ) S u (x x ), where S u = n (n S + n S ), S, S = sample covariance matrices of group and. Under H 0, T T (p, n ). The F statistic is derived from T : Under H 0, F F p,n p. F = n p (n ) p T Computing the test statistics with R: > bumpus.s <- bumpus[:,] > bumpus.d <- bumpus[:9,] > n.s <- nrow(bumpus.s) > n.d <- nrow(bumpus.d) > n <- n.s + n.d > p <- 5 > # sample mean vectors: > sample.mean.s <- apply(bumpus.s,, mean) > sample.mean.d <- apply(bumpus.d,, mean) > # pooled estimate for the covariance matrix: > S.u <- ((n.s-)*var(bumpus.s)+(n.d-)*var(bumpus.d))/(n-) > S.u.inverse <- solve(s.u) > # sample version of Mahalonobis distance (squared): > D <- t(sample.mean.s-sample.mean.d)%*%s.u.inverse%*%(sample.mean.s-sample.mean.d) > # T-squared statistic: > (T <- n.s*n.d/n*d) [,].8698 > # F-statistic > (Fstat <- (n--p)/((n-)*p)*t) [,] e) The p-value is the probability of observing a test statistic that is at least as extreme (in terms of H A ) as the one we saw, given that H 0 holds.
2 > (p.value <- pf(fstat, p, n--p, lower=false)) [,] The p-value is larger than 0.05 there is not enough evidence in the data to say that µ µ we do not reject H 0. f) > library(icsnp) > HotellingsT(bumpus.d, bumpus.s) Hotelling's two sample T-test data: bumpus.d and bumpus.s T. = 0.567, df = 5, df =, p-value = 0.76 alternative hypothesis: true location difference is not equal to c(0,0,0,0,0). a) > data(iris) > # only consider the species 'versicolor' and 'viriginica' > dat <- iris[c(5:50),] > # re-factorize the last column to get rid of the empty class > dat[,5] <- factor(dat[,5]) b) > # compute lda: > res <- lda(species ~., data=dat) > # show the computed vector "a": > res$scaling Sepal.Length Sepal.Width Petal.Length.8850 Petal.Width.870 c) > # predict class of new observation > newdat <- data.frame(sepal.length=6, Sepal.Width=, Petal.Length=, Petal.Width=) > predict(res, newdata=newdat)$class [] versicolor Levels: versicolor virginica d) > nobs <- nrow(dat) > predictions <- array(na, nobs) > for (i in :nobs){ dat.temp <- dat[-i,] res.temp <- lda(species~., data=dat.temp, prior=c(0.5,0.5)) predictions[i] <- predict(res.temp, newdata=dat[i,c(:)])$class } le(predictions, dat$species) predictions versicolor virginica 8 9 > (mcr <- sum(predictions!=as.numeric(dat[,"species"]))/nobs) [] 0.0 > # or easier > lda.cv <- lda(species~., data=dat, prior=c(0.5, 0.5), CV=T) > res <- data.frame(est = lda.cv$class, = dat[, 5]) <- table(res) ## confusion matrix est versicolor virginica versicolor 8 virginica 9
3 > - sum(diag(tab)) / nrow(dat) [] 0.0 The estimated misclassification rate is only 0.0%, we thus expect very good predictions.. a) > library(mass) > t.d <- d.vegenv[d.vegenv[,"vegetationgroup"]>=,] > t.r <- lda(vegetationgroup~sqrt(nardstri)+sqrt(caluvulg)+sqrt(festrubr), data=t.d) > t.r Call: lda(vegetationgroup ~ sqrt(nardstri) + sqrt(caluvulg) + sqrt(festrubr), data = t.d) Prior probabilities of groups: Group means: sqrt(nardstri) sqrt(caluvulg) sqrt(festrubr) Coefficients of linear discriminants: sqrt(nardstri) sqrt(caluvulg) sqrt(festrubr) > plot(t.r) group group The call of the function plot shows the projection of the observations onto the discriminant. We can see that Group attains lower values than does Group. Both groups can largely be separated, albeit not perfectly. b) > t.r.all <- lda(vegetationgroup~sqrt(nardstri)+sqrt(caluvulg)+sqrt(festrubr), data=d.vegenv) > t.r.all
4 Call: lda(vegetationgroup ~ sqrt(nardstri) + sqrt(caluvulg) + sqrt(festrubr), data = d.vegenv) Prior probabilities of groups: Group means: sqrt(nardstri) sqrt(caluvulg) sqrt(festrubr) Coefficients of linear discriminants: LD LD sqrt(nardstri) sqrt(caluvulg) sqrt(festrubr) Proportion of trace: LD LD > plot(t.r.all) LD LD As we can see, groups, and can be separated relatively well by the first two discriminants. The first group is comparatively small, and it is difficult to distinguish from the other three. The third discriminant does not seem to aid the classification into these groups. c) > # only group and > t.pr <- predict(t.r) <- table(t.pr$class, t.d$vegetationgroup) 8
5 5 > -sum(diag(tab))/nrow(t.d) [] > # all four groups > t.pr.all <- predict(t.r.all) <- table(t.pr.all$class, d.vegenv$vegetationgroup) > -sum(diag(tab))/nrow(d.vegenv) [] 0.95 The first table compares the predicted and true group membership for groups. Groups and are easily separated in this manner (with about 9% of all observations correctly classified (misclassification rate 7.%)). The second table shows the -group classification. While groups, and can be easily separated, observations from group are not so frequently recognized as such. This difficulty in classifying Group is one we already saw in the images of the last part of this exercise. The misclassification rate here is.%. d) > t.r.cv <- lda(vegetationgroup~sqrt(nardstri)+sqrt(caluvulg)+sqrt(festrubr), data=t.d, CV=T) > res <- data.frame(est=t.r.cv$class, =t.d$vegetationgroup) <- table(res) est 8 > -sum(diag(tab))/nrow(t.d) [] > t.r.all.cv <- lda(vegetationgroup~sqrt(nardstri)+sqrt(caluvulg)+sqrt(festrubr), data=d.vegenv, CV=T) > res <- data.frame(est=t.r.all.cv$class, =d.vegenv$vegetationgroup) <- table(res) est > -sum(diag(tab))/nrow(d.vegenv) [] 0.65 The estimated misclassification rates for only the two groups and are exactly the same with CV and the plug-in method. For all groups the misclassification rate estimated with CV (.6%) is higher then the one estimated by the plug-in method (.%). The estimated misclassification rate obtained with the plug-in method is in most cases highly optimistic, and thus the CV method should be used.. No solution.
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationThe SAS System 18:28 Saturday, March 10, Plot of Canonical Variables Identified by Cluster
The SAS System 18:28 Saturday, March 10, 2018 1 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=2 Maxiter=10 Converge=0.02 Initial Seeds Cluster SepalLength SepalWidth PetalLength PetalWidth 1
More informationMULTIVARIATE HOMEWORK #5
MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More information4 Statistics of Normally Distributed Data
4 Statistics of Normally Distributed Data 4.1 One Sample a The Three Basic Questions of Inferential Statistics. Inferential statistics form the bridge between the probability models that structure our
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationDiscriminant Analysis (DA)
Discriminant Analysis (DA) Involves two main goals: 1) Separation/discrimination: Descriptive Discriminant Analysis (DDA) 2) Classification/allocation: Predictive Discriminant Analysis (PDA) In DDA Classification
More informationPrincipal Component Analysis (PCA) Principal Component Analysis (PCA)
Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationDiscriminant Analysis
Chapter 16 Discriminant Analysis A researcher collected data on two external features for two (known) sub-species of an insect. She can use discriminant analysis to find linear combinations of the features
More informationMultivariate analysis of variance and covariance
Introduction Multivariate analysis of variance and covariance Univariate ANOVA: have observations from several groups, numerical dependent variable. Ask whether dependent variable has same mean for each
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationPRINCIPAL COMPONENTS ANALYSIS
PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More information3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43
3. The F Test for Comparing Reduced vs. Full Models opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics 510 1 / 43 Assume the Gauss-Markov Model with normal errors: y = Xβ + ɛ, ɛ N(0, σ
More informationApplied Multivariate Analysis
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Discriminant Analysis Background 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive
More informationData Mining with R. Linear Classifiers and the Perceptron Algorithm. Hugh Murrell
Data Mining with R Linear Classifiers and the Perceptron Algorithm Hugh Murrell references These slides are based on a notes by Cosma Shalizi and an essay by Charles Elkan, but are self contained and access
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationSVM-flexible discriminant analysis
SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis
More informationExtensions to LDA and multinomial regression
Extensions to LDA and multinomial regression Patrick Breheny September 22 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction Quadratic discriminant analysis Fitting models Linear discriminant
More informationCreative Data Mining
Creative Data Mining Using ML algorithms in python Artem Chirkin Dr. Daniel Zünd Danielle Griego Lecture 7 0.04.207 /7 What we will cover today Outline Getting started Explore dataset content Inspect visually
More information1. Introduction to Multivariate Analysis
1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationChapter 7, continued: MANOVA
Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to
More informationMachine Leanring Theory and Applications: third lecture
Machine Leanring Theory and Applications: third lecture Arnak Dalalyan ENSAE PARISTECH 12 avril 2016 Framework and notation Framework and notation We observe (X i, Y i ) X Y, i = 1,..., n independent randomly
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationISCID-CO Dunkerque/ULCO. Mathematics applied to economics and management Foundations of Descriptive and Inferential Statistics
IMBS 1 ISCID-CO Dunkerque/ULCO Mathematics applied to economics and management Foundations of Descriptive and Inferential Statistics December 2015 - Final assessment - Session 1 - Semester 1 Time allowed
More informationBayesian Classification Methods
Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define
More informationDISCRIMINANT ANALYSIS. 1. Introduction
DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationz = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is
Example X~N p (µ,σ); H 0 : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) H 0β : β µ = 0 test statistic for H 0β is y z = β βσβ /n And reject H 0β if z β > c [suitable critical value] 301 Reject H 0 if
More informationTopics in this module. Learning material for this module: Move to: James et al (2013): An Introduction to Statistical Learning. Chapter 2.2.3, 4.
Module 4: CLASSIFICATION TMA4268 Statistical Learning V2018 Mette Langaas and Julia Debik, Department of Mathematical Sciences, NTNU week 5 2018 (Version 31.01.2018) Learning material for this module:
More informationComputer Assignment 8 - Discriminant Analysis. 1 Linear Discriminant Analysis
Computer Assignment 8 - Discriminant Analysis Created by James D. Wilson, UNC Chapel Hill Edited by Andrew Nobel and Kelly Bodwin, UNC Chapel Hill In this assignment, we will investigate another tool for
More informationLast time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA
Last time: PCA Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml
More informationSection 7: Discriminant Analysis.
Section 7: Discriminant Analysis. Ensure you have completed all previous worksheets before advancing. 1 Linear Discriminant Analysis To perform Linear Discriminant Analysis in R we will make use of the
More informationDiscriminant Analysis
Chapter 5 Discriminant Analysis This chapter considers discriminant analysis: given p measurements w, we want to correctly classify w into one of G groups or populations. The maximum likelihood, Bayesian,
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationMANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''
14 3! "#!$%# $# $&'('$)!! (Analysis of Variance : ANOVA) *& & "#!# +, ANOVA -& $ $ (+,$ ''$) *$#'$)!!#! (Multivariate Analysis of Variance : MANOVA).*& ANOVA *+,'$)$/*! $#/#-, $(,!0'%1)!', #($!#$ # *&,
More informationSTA 437: Applied Multivariate Statistics
Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests
More information9.5 t test: one μ, σ unknown
GOALS: 1. Recognize the assumptions for a 1 mean t test (srs, nd or large sample size, population stdev. NOT known). 2. Understand that the actual p value (area in the tail past the test statistic) is
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationSAS/STAT 15.1 User s Guide The CANDISC Procedure
SAS/STAT 15.1 User s Guide The CANDISC Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationCSC314 / CSC763 Introduction to Machine Learning
CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationOne-way ANOVA. Experimental Design. One-way ANOVA
Method to compare more than two samples simultaneously without inflating Type I Error rate (α) Simplicity Few assumptions Adequate for highly complex hypothesis testing 09/30/12 1 Outline of this class
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationSTAT 730 Chapter 1 Background
STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationLecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationLecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)
Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationCellwise robust regularized discriminant analysis
Cellwise robust regularized discriminant analysis Ines Wilms (KU Leuven) and Stéphanie Aerts (University of Liège) ICORS, July 2017 Wilms and Aerts Cellwise robust regularized discriminant analysis 1 Discriminant
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationCellwise robust regularized discriminant analysis
Cellwise robust regularized discriminant analysis JSM 2017 Stéphanie Aerts University of Liège, Belgium Ines Wilms KU Leuven, Belgium Cellwise robust regularized discriminant analysis 1 Discriminant analysis
More informationMULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton
MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS Richard Brereton r.g.brereton@bris.ac.uk Pattern Recognition Book Chemometrics for Pattern Recognition, Wiley, 2009 Pattern Recognition Pattern Recognition
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationA Comparison of Missing Data Handling Methods Catherine Truxillo, Ph.D., SAS Institute Inc, Cary, NC
A Comparison of Missing Data Handling Methods Catherine Truxillo, Ph.D., SAS Institute Inc, Cary, NC ABSTRACT Incomplete data presents a problem in both inferential and predictive modeling applications.
More informationQuiz #3 Research Hypotheses that Involve Comparing Non-Nested Models
Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models The researcher also wanted to test the hypothesis that students with internal versus external locus of control could be better distinguished
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationBootstrap Method for Estimating Error Rate in Linear Discriminant Analysis (LDA)
Vol6, No1, 2016 wwwiisteorg Bootstrap Method for Estimating Error Rate in Linear Discriminant Analysis (LDA) Obiora-Ilouno Happiness O Nwoke Chidinma B Uzuke CA Department of Statistics, Nnamdi Azikiwe
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationHow do we compare the relative performance among competing models?
How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model
More informationDiscriminant Analysis
Discriminant Analysis V.Čekanavičius, G.Murauskas 1 Discriminant analysis one categorical variable depends on one or more normaly distributed variables. Can be used for forecasting. V.Čekanavičius, G.Murauskas
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMultivariate Linear Models
Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements
More informationRobust Linear Discriminant Analysis
Journal of Mathematics and Statistics Original Research Paper Robust Linear Discriminant Analysis Sharipah Soaad Syed Yahaya, Yai-Fung Lim, Hazlina Ali and Zurni Omar School of Quantitative Sciences, UUM
More informationSTAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.
STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems
More informationVector Space Examples Math 203 Spring 2014 Myers. Example: S = set of all doubly infinite sequences of real numbers = {{y k } : k Z, y k R}
Vector Space Examples Math 203 Spring 2014 Myers Example: S = set of all doubly infinite sequences of real numbers = {{y k } : k Z, y k R} Define {y k } + {z k } = {y k + z k } and c{y k } = {cy k }. Example:
More informationIntroduction to the Analysis of Variance (ANOVA)
Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more
More informationExample 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.
Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More informationCLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS
CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833
More informationAn Introduction to Multivariate Statistical Analysis
An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents
More informationMultivariate Data Analysis Notes & Solutions to Exercises 3
Notes & Solutions to Exercises 3 ) i) Measurements of cranial length x and cranial breadth x on 35 female frogs 7.683 0.90 gave x =(.860, 4.397) and S. Test the * 4.407 hypothesis that =. Using the result
More informationYou can compute the maximum likelihood estimate for the correlation
Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationTopic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17]
Topic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17] 13.5 Assumptions of ANCOVA The assumptions of analysis of covariance are: 1. The X s are fixed, measured without error, and independent
More informationStevens 2. Aufl. S Multivariate Tests c
Stevens 2. Aufl. S. 200 General Linear Model Between-Subjects Factors 1,00 2,00 3,00 N 11 11 11 Effect a. Exact statistic Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Pillai's Trace
More informationCS444/544: Midterm Review. Carlos Scheidegger
CS444/544: Midterm Review Carlos Scheidegger D3: DATA-DRIVEN DOCUMENTS The essential idea D3 creates a two-way association between elements of your dataset and entries in the DOM D3 operates on selections
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationCSE 311: Foundations of Computing I Autumn 2014 Practice Final: Section X. Closed book, closed notes, no cell phones, no calculators.
CSE 311: Foundations of Computing I Autumn 014 Practice Final: Section X YY ZZ Name: UW ID: Instructions: Closed book, closed notes, no cell phones, no calculators. You have 110 minutes to complete the
More informationA Simple Implementation of the Stochastic Discrimination for Pattern Recognition
A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University
More informationMANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:
MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationRegularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution
Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationT. Mark Beasley One-Way Repeated Measures ANOVA handout
T. Mark Beasley One-Way Repeated Measures ANOVA handout Profile Analysis Example In the One-Way Repeated Measures ANOVA, two factors represent separate sources of variance. Their interaction presents an
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationY (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More information