(b) The sample canonical correlation matrix is reported below:

Size: px
Start display at page:

Download "(b) The sample canonical correlation matrix is reported below:"

Transcription

1 Multivariate data analysis HW#4 1. Solution: (a) The estimated coefficients for X are: So j! = ( , , ). The estimated coefficients for Y are: So h! = ( , , ). (b) The sample canonical correlation matrix is reported below: We can see that the correlation coefficients between ξ!, ξ!, ω!, ω!, (ξ!, ω! ) for i j are all approximately zero.

2 2. Solution: (a) Visualize the first observation by plotting a against y as follows: We can see that this gives us a digit 3. (b) Scree plot and Cumulative scree plot are:

3 And scatterplot matrix of principal components is: (c) The data looks as if they are from a MVN distribution, as after examining the pdf of each of the principal components, we can see that the shape of all the principal components are very close to normal. As they are linear combinations of the original data, and if all linear combinations of the vectors are normally distributed, then the vector will have jointly MVN, we can get a sense that the data looks as if they are from MVN. (d) In order to see how many principal components we should keep, I draw the cumulative scree plot and a horizontal line at 90% below:

4 After examining the cumulative scatter plot above and by the 90% cutoff criteria, I will keep the first six or seven principal components, as six components almost achieve 0.90, but a little bit smaller than (e) Now I walk alone each of the first four principal components, and for each principal component, I give scatterplot for mean, ±2σ five points, all the twenty plots are copied below, with each row represents the five plots for each of the four principal components:

5 (f) After combining the two data sets together, I do a PCA for the new data set and copy the scatterplot matrix of principal components below.

6 The scatterplot matrix above is too small, so in order to see if we can see a nice separation of observations corresponding to the digit 3 and digit 8, I draw another scatterplot matrix which contains the first several principal component. Now we need to check how many principal components to keep, so I draw the following scree plot and cumulative scree plot: By the 90% cutoff criteria, I will keep the first five principal components and draw the scatterplot matrix of the first five principal components as follows:

7 After examining the graphs above, we can clearly see that there is a relatively nice separation for the digit 3 which are blue dots and the digit 8 which are red crossings in the scatterplots for principal component 1 with principal component 2, principal component 1 with principal component 3, principal component 1 with principal component 4 and principal component 1 with principal component 5. So we can separate digit 3 and digit 8. R code: #part a# x<- c(pendigit3[1,1],pendigit3[1,3],pendigit3[1,5],pendigit3[1,7],pendigit3[1,9],pendigit 3[1,11],pendigit3[1,13],pendigit3[1,15]) y<- c(pendigit3[1,2],pendigit3[1,4],pendigit3[1,6],pendigit3[1,8],pendigit3[1,10],pendig it3[1,12],pendigit3[1,14],pendigit3[1,16]) plot(x,y) lines(x,y) #part b and d# x<- pendigit3[,c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)] spr <- princomp(x) U<- spr$loadings L<- (spr$sdev)^2 Z <- spr$scores pairs(z,pch=c(rep(1,1055)),col=c(rep("blue",1055))) par(mfrow=c(1,2)) plot(l,type="b",xlab="component",ylab="lambda",main="scree plot") plot(cumsum(l)/sum(l)*100,ylim=c(0,100),type="b",xlab="component",ylab="cum ulative propotion (%)",main="cum. Scree plot") plot(cumsum(l)/sum(l)*100,ylim=c(0,100),type="b",xlab="component",ylab="cum ulative propotion (%)",main="cum. Scree plot") abline(h=90) #part e# x<- pendigit3[,c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)] spr <- princomp(x) U<- spr$loadings L<- (spr$sdev)^2 Z <- spr$scores par(mfrow=c(1,5))

8 v1<- c(mean(x[,1]),mean(x[,2]),mean(x[,3]),mean(x[,4]),mean(x[,5]),mean(x[,6]),mean(x [,7]),mean(x[,8]),mean(x[,9]),mean(x[,10]),mean(x[,11]),mean(x[,12]),mean(x[,13]), mean(x[,14]),mean(x[,15]),mean(x[,16])) a1<- v1+(- 1)*sqrt(L[1])*U[,1] a2<- v1+(- 2)*sqrt(L[1])*U[,1] a3<- v1 a4<- v1+sqrt(l[1])*u[,1] a5<- v1+2*sqrt(l[1])*u[,1] x1<- c(a1[1],a1[3],a1[5],a1[7],a1[9],a1[11],a1[13],a1[15]) y1<- c(a1[2],a1[4],a1[6],a1[8],a1[10],a1[12],a1[14],a1[16]) plot(x1,y1) lines(x1,y1) x2<- c(a2[1],a2[3],a2[5],a2[7],a2[9],a2[11],a2[13],a2[15]) y2<- c(a2[2],a2[4],a2[6],a2[8],a2[10],a2[12],a2[14],a2[16]) plot(x2,y2) lines(x2,y2) x3<- c(a3[1],a3[3],a3[5],a3[7],a3[9],a3[11],a3[13],a3[15]) y3<- c(a3[2],a3[4],a3[6],a3[8],a3[10],a3[12],a3[14],a3[16]) plot(x3,y3) lines(x3,y3) x4<- c(a4[1],a4[3],a4[5],a4[7],a4[9],a4[11],a4[13],a4[15]) y4<- c(a4[2],a4[4],a4[6],a4[8],a4[10],a4[12],a4[14],a4[16]) plot(x4,y4) lines(x4,y4) x5<- c(a5[1],a5[3],a5[5],a5[7],a5[9],a5[11],a5[13],a5[15]) y5<- c(a5[2],a5[4],a5[6],a5[8],a5[10],a5[12],a5[14],a5[16]) plot(x5,y5) lines(x5,y5) #part f# total <- rbind(pendigit3, pendigit8) Y<- total[,c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)] spr2<- princomp(y) U2<- spr2$loadings L2<- (spr2$sdev)^2 Z2 <- spr2$scores pairs(z2,pch=c(rep(1,1055),rep(3,1055)),col=c(rep("blue",1055),rep("red",1055))) par(mfrow=c(1,2)) plot(l2,type="b",xlab="component",ylab="lambda",main="scree plot")

9 plot(cumsum(l2)/sum(l2)*100,ylim=c(0,100),type="b",xlab="component",ylab="c umulative propotion (%)",main="cum. Scree plot") pairs(z2[,c(1,2,3,4,5)],pch=c(rep(1,1055),rep(3,1055)),col=c(rep("blue",1055),rep("red",105 5))) pairs(z2[,c(1,2,3,4,5)],pch=c(rep(1,1055),rep(3,1055)),col=c(rep("blue",1055),rep("red",105 5))) abline(h=90) 3. Solution: Use 10- fold cross validation to report misclassification rates of the four classifiers as follows: (1) LDA The 10- fold cross validation misclassification rate is (2) Nearest Centroid rule The 10- fold cross validation misclassification rate is (3) QDA The 10- fold cross validation misclassification rate is (4) 5- Nearest Neighbors The 10- fold cross validation misclassification rate is 0. We can use the following bar plot to report the results above:

10 4. Solution:

11 5. Solution: The code below is a pseudo- code in R for 10- fold cross validation calculation in the two questions above. cross_validation <- function(x,y,v=10,classify_fun,predict_fun){ n = length(y) fold_size = ceiling(n/v) index = sample(n,n) CVerror = 0 for (v in 1:V){ temp = ((v-1)*fold_size):(min(v*fold_size,n)) test_index = index[temp] train_index = index[-temp] traindata = X[train_index,] trainy = y[train_index] testdata = X[test_index,] testy = y[test_index] model = classify_fun(traindata,trainy) prediction = predict_fun(model,testdata) CVerror = CVerror + sum(prediction!=testy) } CVerror = CVerror/n return(cverror) }

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

Chapter 9 Ingredients of Multivariable Change: Models, Graphs, Rates

Chapter 9 Ingredients of Multivariable Change: Models, Graphs, Rates Chapter 9 Ingredients of Multivariable Change: Models, Graphs, Rates 9.1 Multivariable Functions and Contour Graphs Although Excel can easily draw 3-dimensional surfaces, they are often difficult to mathematically

More information

Prelab: Complete the prelab section BEFORE class Purpose:

Prelab: Complete the prelab section BEFORE class Purpose: Lab: Projectile Motion Prelab: Complete the prelab section BEFORE class Purpose: What is the Relationship between and for the situation of a cannon ball shot off a with an angle of from the horizontal.

More information

Frequency and Histograms

Frequency and Histograms Warm Up Lesson Presentation Lesson Quiz Algebra 1 Create stem-and-leaf plots. Objectives Create frequency tables and histograms. Vocabulary stem-and-leaf plot frequency frequency table histogram cumulative

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Unsupervised clustering of COMBO-17 galaxy photometry

Unsupervised clustering of COMBO-17 galaxy photometry STScI Astrostatistics R tutorials Eric Feigelson (Penn State) November 2011 SESSION 2 Multivariate clustering and classification ***************** ***************** Unsupervised clustering of COMBO-17

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics January 24, 2018 CS 361: Probability & Statistics Relationships in data Standard coordinates If we have two quantities of interest in a dataset, we might like to plot their histograms and compare the two

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Graphing Data. Example:

Graphing Data. Example: Graphing Data Bar graphs and line graphs are great for looking at data over time intervals, or showing the rise and fall of a quantity over the passage of time. Example: Auto Sales by Year Year Number

More information

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées.

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées. Presentation of Presentation of Use of Introduction : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées. Authors of : Christine Thomas-Agnan,

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area under the curve. The curve is called the probability

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Michael Friendly and Matthew Sigal September 18, 2017 Contents Introduction 1 1 Visualizing mean differences: The HE plot framework

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis Introduction to Spatial Analysis I. Conceptualizing space Session organization Module : Conceptualizing space Module : Spatial analysis of lattice data Module : Spatial analysis of point patterns Module

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

ECE 592 Topics in Data Science

ECE 592 Topics in Data Science ECE 592 Topics in Data Science Final Fall 2017 December 11, 2017 Please remember to justify your answers carefully, and to staple your test sheet and answers together before submitting. Name: Student ID:

More information

In-class determine between which cube root. Assignments two consecutive whole

In-class determine between which cube root. Assignments two consecutive whole Unit 1: Expressions Part A: Powers and Roots (2-3 Weeks) Essential Questions How can algebraic expressions be used to model, analyze, and solve mathematical situations? I CAN Statements Vocabulary Standards

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

(a) Find the value of x. (4) Write down the standard deviation. (2) (Total 6 marks)

(a) Find the value of x. (4) Write down the standard deviation. (2) (Total 6 marks) 1. The following frequency distribution of marks has mean 4.5. Mark 1 2 3 4 5 6 7 Frequency 2 4 6 9 x 9 4 Find the value of x. (4) Write down the standard deviation. (Total 6 marks) 2. The following table

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic New features for PCA (Principal Component Analysis) in Tanagra 1.4.45 and later: tools for the determination of the number of factors. Principal Component Analysis (PCA) 1 is a very popular dimension

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Contents. 13. Graphs of Trigonometric Functions 2 Example Example

Contents. 13. Graphs of Trigonometric Functions 2 Example Example Contents 13. Graphs of Trigonometric Functions 2 Example 13.19............................... 2 Example 13.22............................... 5 1 Peterson, Technical Mathematics, 3rd edition 2 Example 13.19

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

1 of 7 8/11/2014 10:27 AM Units: Teacher: PacedAlgebraPartA, CORE Course: PacedAlgebraPartA Year: 2012-13 The Language of Algebra What is the difference between an algebraic expression and an algebraic

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

A Scientific Model for Free Fall.

A Scientific Model for Free Fall. A Scientific Model for Free Fall. I. Overview. This lab explores the framework of the scientific method. The phenomenon studied is the free fall of an object released from rest at a height H from the ground.

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Grade 3. Grade 3 K 8 Standards 23

Grade 3. Grade 3 K 8 Standards 23 Grade 3 In grade 3, instructional time should focus on four critical areas: (1) developing understanding of multiplication and division and strategies for multiplication and division within 100; (2) developing

More information

Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors

Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors UPPSALA UNIVERSITY Department of Mathematics Måns Thulin Multivariate Methods Spring 2011 thulin@math.uu.se Computer exercise 3: PCA, CCA and factors In this computer exercise the following topics are

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Kernel Sliced Inverse Regression With Applications to Classification

Kernel Sliced Inverse Regression With Applications to Classification May 21-24, 2008 in Durham, NC Kernel Sliced Inverse Regression With Applications to Classification Han-Ming Wu (Hank) Department of Mathematics, Tamkang University Taipei, Taiwan 2008/05/22 http://www.hmwu.idv.tw

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II CS 5630/6630 Scientific Visualization Elementary Plotting Techniques II Motivation Given a certain type of data, what plotting technique should I use? What plotting techniques should be avoided? How do

More information

Are You Ready? Multiply and Divide Fractions

Are You Ready? Multiply and Divide Fractions SKILL Multiply and Divide Fractions eaching Skill Objective Multiply and divide fractions. Review with students the steps for multiplying fractions. Point out that it is a good idea to write fraction multiplication

More information

UNIVERSITY OF NORTH CAROLINA CHARLOTTE 1995 HIGH SCHOOL MATHEMATICS CONTEST March 13, 1995 (C) 10 3 (D) = 1011 (10 1) 9

UNIVERSITY OF NORTH CAROLINA CHARLOTTE 1995 HIGH SCHOOL MATHEMATICS CONTEST March 13, 1995 (C) 10 3 (D) = 1011 (10 1) 9 UNIVERSITY OF NORTH CAROLINA CHARLOTTE 5 HIGH SCHOOL MATHEMATICS CONTEST March, 5. 0 2 0 = (A) (B) 0 (C) 0 (D) 0 (E) 0 (E) 0 2 0 = 0 (0 ) = 0 2. If z = x, what are all the values of y for which (x + y)

More information

Symmetry Transforms 1

Symmetry Transforms 1 Symmetry Transforms 1 Motivation Symmetry is everywhere 2 Motivation Symmetry is everywhere Perfect Symmetry [Blum 64, 67] [Wolter 85] [Minovic 97] [Martinet 05] 3 Motivation Symmetry is everywhere Local

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Mathematics Grade 3. grade 3 21

Mathematics Grade 3. grade 3 21 Mathematics Grade 3 In Grade 3, instructional time should focus on four critical areas: (1) developing understanding of multiplication and division and strategies for multiplication and division within

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

The Principal Component Analysis

The Principal Component Analysis The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images

More information

The role of dimensionality reduction in classification

The role of dimensionality reduction in classification The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Calculating Weight Using Volume

Calculating Weight Using Volume Another Way to Get the Answer Advanced Calculate the weight of a steel reinforcing bar (long cylinder) if length is 50 feet, the cross-sectional area is 0.785 in 2 (Bar diameter is 1 inch.) The density

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information

Package rrr. R topics documented: December 9, Title Reduced-Rank Regression Version URL

Package rrr. R topics documented: December 9, Title Reduced-Rank Regression Version URL Title Reduced-Ran Regression Version 1.0.0 URL http://github.com/chrisadd/rrr Pacage rrr December 9, 2016 Reduced-ran regression, diagnostics and graphics. Depends R (>= 3.2.0) Imports Rcpp, MASS, magrittr,

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Jointly Distributed Variables

Jointly Distributed Variables Jointly Distributed Variables Sec 2.6, 9.1 & 9.2 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 7-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Package SPreFuGED. July 29, 2016

Package SPreFuGED. July 29, 2016 Type Package Package SPreFuGED July 29, 2016 Title Selecting a Predictive Function for a Given Gene Expression Data Version 1.0 Date 2016-07-27 Author Victor L Jong, Kit CB Roes & Marinus JC Eijkemans

More information

Global Clinical Data Classification: A Discriminate Analysis

Global Clinical Data Classification: A Discriminate Analysis Global Clinical Data Classification: A Discriminate Analysis Amurthur Ramamurthy, Gordon Kapke and Jodi Yoder Covance Central Laboratories, Indianapolis, IN How different is Clinical Laboratory data sets

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

A Triangular Array of the Counts of Natural Numbers with the Same Number of Prime Factors (Dimensions) Within 2 n Space

A Triangular Array of the Counts of Natural Numbers with the Same Number of Prime Factors (Dimensions) Within 2 n Space A Triangular Array of the Counts of Natural Numbers with the Same Number of Prime Factors (Dimensions) Within 2 n Space Abstract By defining the dimension of natural numbers as the number of prime factors,

More information

Some hints for the Radioactive Decay lab

Some hints for the Radioactive Decay lab Some hints for the Radioactive Decay lab Edward Stokan, March 7, 2011 Plotting a histogram using Microsoft Excel The way I make histograms in Excel is to put the bounds of the bin on the top row beside

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

Vectorization. Yu Wu, Ishan Patil. October 13, 2017

Vectorization. Yu Wu, Ishan Patil. October 13, 2017 Vectorization Yu Wu, Ishan Patil October 13, 2017 Exercises to be covered We will implement some examples of image classification algorithms using a subset of the MNIST dataset logistic regression for

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Transform Coding. Transform Coding Principle

Transform Coding. Transform Coding Principle Transform Coding Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Entropy coding of transform coefficients

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Grade 3 Unit Standards ASSESSMENT #1

Grade 3 Unit Standards ASSESSMENT #1 ASSESSMENT #1 3.NBT.1 Use place value understanding to round whole numbers to the nearest 10 or 100 Fluently add and subtract within 1000 using strategies and algorithms based on place value, properties

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Fall 07 ISQS 6348 Midterm Solutions

Fall 07 ISQS 6348 Midterm Solutions Fall 07 ISQS 648 Midterm Solutions Instructions: Open notes, no books. Points out of 00 in parentheses. 1. A random vector X = 4 X 1 X X has the following mean vector and covariance matrix: E(X) = 4 1

More information

A Correlation of. Student Activity Book. to the Common Core State Standards for Mathematics. Grade 2

A Correlation of. Student Activity Book. to the Common Core State Standards for Mathematics. Grade 2 A Correlation of Student Activity Book to the Common Core State Standards for Mathematics Grade 2 Copyright 2016 Pearson Education, Inc. or its affiliate(s). All rights reserved Grade 2 Units Unit 1 -

More information

California CCSS Mathematics Grades 1-3

California CCSS Mathematics Grades 1-3 Operations and Algebraic Thinking Represent and solve problems involving addition and subtraction. 1.OA.1. Use addition and subtraction within 20 to solve word problems involving situations of adding to,

More information

Pre-Junior Certificate Examination, Mathematics. Paper 1 Ordinary Level Time: 2 hours. 300 marks. For examiner Question Mark Question Mark

Pre-Junior Certificate Examination, Mathematics. Paper 1 Ordinary Level Time: 2 hours. 300 marks. For examiner Question Mark Question Mark J.17 NAME SCHOOL TEACHER Pre-Junior Certificate Examination, 016 Name/vers Printed: Checked: To: Updated: Name/vers Complete ( Paper 1 Ordinary Level Time: hours 300 marks For examiner Question Mark Question

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Level 3 Calculus, 2008

Level 3 Calculus, 2008 90635 3 906350 For Supervisor s Level 3 Calculus, 008 90635 Differentiate functions and use derivatives to solve problems Credits: Six 9.30 am Tuesday 18 November 008 Check that the National Student Number

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information