Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /15/ /12

Size: px

Start display at page:

Download "Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /15/ /12"

Nigel Lewis
5 years ago
Views:

1 BIO5312 Biostatistics R Session 12: Principal Component Analysis Dr. Junchao Xia Center of Biophysics and Computational Biology Fall /15/ /12

2 Matrix Operations I: Constructing matrix(data, nrow, ncol, byrow) # set work directory >setwd("c:/users/junchao/desktop/biostatistics_5312/2016/lab_12") # set a sequence data to a matrix >seq1=seq(1:6) >seq1 [1] >mat1=matrix(seq1,nrow=2) > mat1 [,1] [,2] [,3] [1,] [2,] #filling the matrix by rows >mat2=matrix(seq1,nrow=2,byrow=t) >mat2 [,1] [,2] [,3] [1,] [2,] # generate 20 random number and put them in a matrix >mat3 <- matrix(rnorm(20), 4) >mat3 11/15/ /12

3 matrix(data, nrow, ncol, byrow) #operations by elements >mat1-mat2 >mat1*3 >mat1-4 >mat1*mat2 >mat1/mat2 >mat1[1,3] # transpose >t(mat1) [,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6 # matrix multiplication >mat3=mat1 %*% t(mat2) >mat3 [,1] [,2] [1,] [2,] Matrix Operations II: Elements 11/15/ /12

4 Matrix Operations III: Inverse Matrix >help( solve ) Solve a System of Equations Description This generic function solves the equation a %*% x = b for x, where b can be either a vector or a matrix. Usage solve(a, b,...) ## Default S3 method: solve(a, b, tol, LINPACK = FALSE,...) Arguments A a square numeric or complex matrix containing the coefficients of the linear system. Logical matrices are coerced to numeric. B a numeric or complex vector or matrix giving the right-hand side(s) of the linear system. If missing, b is taken to be an identity matrix andsolve will return the inverse of a. Tol the tolerance for detecting linear dependencies in the columns of a. >solve(mat3) [,1] [,2] [1,] [2,] /15/ /12

5 Matrix Operations IV: Decomposition >help( eigen ) Spectral Decomposition of a Matrix Description Computes eigenvalues and eigenvectors of numeric (double, integer, logical) or complex matrices. Usage eigen(x, symmetric, only.values = FALSE, EISPACK = FALSE) Arguments X a numeric or complex matrix whose spectral decomposition is to be computed. Logical matrices are coerced to numeric. Symmetric if TRUE, the matrix is assumed to be symmetric (or Hermitian if complex) and only its lower triangle (diagonal included) is used. If symmetric is not specified, the matrix is inspected for symmetry. only.values if TRUE, only the eigenvalues are computed and returned, otherwise both eigenvalues and eigenvectors are returned. 11/15/ /12

6 Matrix Operations IV: Decomposition # perform eigenvalue and eigenvector analysis >mat3eig=eigen(mat3) >mat3eig $values [1] $vectors [,1] [,2] [1,] [2,] # get the first eigenvalue > mat3eig$values[1] # ge the second eigenvector > mat3eig$vectors[,2] 11/15/ /12

7 Principal Component Analysis I # get help >help( prcomp ) # read data from the data file >sbp=read.table("table11.9.dat.txt",header=t) >sbp24=sbp[c(2,3,4)] >pcaresult=prcomp(sbp24) >pcaresult Standard deviations: [1] Rotation: PC1 PC2 PC3 Birthweight Age SBP >summary(pcaresult) Importance of components: PC1 PC2 PC3 Standard deviation Proportion of Variance Cumulative Proportion /15/ /12

8 Principal Component Analysis II # plot the variance from each PCA >plot(pcaresult) # print out the data in PCA coordinates >pcaresult$x # check correlations in PCA coordinates >cor(pcaresult$x) 11/15/ /12

9 K-Mean Clustering I # check the datapoints and show them as original plot >iris # plot the data by the type of speices >ggplot(iris, aes(petal.length, Petal.Width, color = Species)) + geom_point() # perform kmean analysis using 3 clusters #help( kmeans ) >set.seed(20) >iriscluster=kmeans(iris[, 3:4], 3, nstart = 20) >iriscluster # compare the clusters with species >table(iriscluster$cluster, iris$species) # plot the data using the clusters >iriscluster$cluster=as.factor(iriscluster$cluster) > ggplot(iris, aes(petal.length, Petal.Width, color = iriscluster$cluster)) + geom_point() 11/15/ /12

10 K-Mean Clustering II 11/15/ /12

11 Hierachical Clustering I # check the help >help( hclust ) >help( dist ) # perform hierachical clustering analysis using the distance matrix >hclusters=hclust(dist(iris[, 3:4])) # plot the cluster dendrogram >plot(hclusters) # get the 3 clusters >help( cutree ) >clustercut=cutree(hclusters, 3) # compare the clusters with species >table(clustercut, iris$species) >clusters=hclust(dist(iris[, 3:4]), method = 'average') # use a different method >plot(clusters) >clustercut=cutree(clusters, 3) >table(clustercut, iris$species) >ggplot(iris, aes(petal.length, Petal.Width, color = clustercut)) + geom_point() 11/15/ /12

12 Hierachical Clustering II 11/15/ /12

13 The End 11/15/ /12

The Matrix Package. March 9, 2001

The Matrix Package. March 9, 2001 R topics documented: The Matri Package March 9, 2001 Hermitian.test..................................... 1 LowerTriangular.test.................................. 2 Matri..........................................