Package intansv. October 14, 2018

Similar documents
Package clonotyper. October 11, 2018

Package cellscape. October 15, 2018

Package NarrowPeaks. September 24, 2012

Package Path2PPI. November 7, 2018

Package GeneExpressionSignature

Package covrna. R topics documented: September 7, Type Package

Variant visualisation and quality control

RGP finder: prediction of Genomic Islands

Introduction to analyzing NanoString ncounter data using the NanoStringNormCNV package

BLAST. Varieties of BLAST

Synteny Portal Documentation

Package flowmap. October 8, 2018

Package NTW. November 20, 2018

Package genphen. March 18, 2019

Supplementary Material

Comparing whole genomes

Package Rdisop. R topics documented: August 20, Title Decomposition of Isotopic Patterns Version Date

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Introduction to PLINK H3ABionet Course Covenant University, Nigeria

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Package dhga. September 6, 2016

GS Analysis of Microarray Data

Package RootsExtremaInflections

Package chimeraviz. April 13, 2019

Supplementary Information for Discovery and characterization of indel and point mutations

GS Analysis of Microarray Data

A Laplacian of Gaussian-based Approach for Spot Detection in Two-Dimensional Gel Electrophoresis Images

Package accelmissing

Become a Microprobe Power User Part 2: Qualitative & Quantitative Analysis

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Computational approaches for functional genomics

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Package WGDgc. October 23, 2015

Supplementary Information

Package DirichletMultinomial

RNA-seq. Differential analysis

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package vhica. April 5, 2016

ChemmineR: A Compound Mining Toolkit for Chemical Genomics in R

The OmicCircos usages by examples

Package locfdr. July 15, Index 5

Basic Local Alignment Search Tool

Sequence analysis and comparison

The Schrödinger KNIME extensions

MACAU 2.0 User Manual

Tests for Two Coefficient Alphas

Package rela. R topics documented: February 20, Version 4.1 Date Title Item Analysis Package with Standard Errors

Whole Genome Alignments and Synteny Maps

Package BRAIN. June 30, 2018

Haploid & diploid recombination and their evolutionary impact

Package AMA. October 27, 2010

Census Mapping with ArcGIS

Multiple Whole Genome Alignment

Math 361. Day 3 Traffic Fatalities Inv. A Random Babies Inv. B

Package aspi. R topics documented: September 20, 2016

Package IMMAN. May 3, 2018

Package depth.plot. December 20, 2015

Package NanoStringDiff

Package metafuse. October 17, 2016

Package rgabriel. February 20, 2015

Comparative Genomics II

Algorithm User Guide:

ProMass Deconvolution User Training. Novatia LLC January, 2013

Esri UC2013. Technical Workshop.

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier

Package sindyr. September 10, 2018

BIOINFORMATICS: An Introduction

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets

Package ShrinkCovMat

Moving into the information age: From records to Google Earth

Applications of genome alignment

Supplementary Materials for

Introduction to analyzing NanoString ncounter data using the NanoStringNormCNV package

Geoprocessing Tools at ArcGIS 9.2 Desktop

Eric Pitman Summer Workshop in Computational Science

Package plw. R topics documented: May 7, Type Package

Package BRAIN. October 16, 2018

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Package MultisiteMediation

SPOTTED cdna MICROARRAYS

The OmicCircos usages by examples

CHAPTER 12 THE CELL CYCLE. Section A: The Key Roles of Cell Division

GIS CONCEPTS ARCGIS METHODS AND. 2 nd Edition, July David M. Theobald, Ph.D. Natural Resource Ecology Laboratory Colorado State University

SUPPLEMENTARY INFORMATION

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands.

ncounter PlexSet Data Analysis Guidelines

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Model Accuracy Measures

Package mirlastic. November 12, 2015

Package platetools. R topics documented: June 25, Title Tools and Plots for Multi-Well Plates Version 0.1.1

Package gtheory. October 30, 2016

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Polar alignment in 5 steps based on the Sánchez Valente method

RNA evolution and Genotype to Phenotype maps

DNA, Chromosomes, and Genes

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

CoB_Bounds_Full_201802

ASTERICS - H Abell 1656: the Coma Cluster of Galaxies

Package KMgene. November 22, 2017

ATLAS of Biochemistry

Transcription:

Title Integrative analysis of structural variations Package intansv October 14, 2018 This package provides efficient tools to read and integrate structural variations predicted by popular softwares. Annotation and visulation of structural variations are also implemented in the package. Version 1.22.0 Author <ywhzau@gmail.com> Maintainer <ywhzau@gmail.com> biocviews Genetics, Annotation, Sequencing, Software Depends R (>= 2.14.0), plyr, ggbio, GenomicRanges Imports BiocGenerics, IRanges License Artistic-2.0 git_url https://git.bioconductor.org/packages/intansv git_branch RELEASE_3_7 git_last_commit 0c4dd1e git_last_commit_date 2018-07-19 Date/Publication 2018-10-13 R topics documented: methodsmerge....................................... 2 plotchromosome...................................... 3 plotregion.......................................... 4 readbreakdancer...................................... 5 readcnvnator........................................ 7 readdelly.......................................... 8 readlumpy......................................... 9 readpin.......................................... 10 readsoftsearch....................................... 11 readsvseq.......................................... 12 svannotation........................................ 13 Index 15 1

2 methodsmerge methodsmerge Integrate structural variations predicted by different methods Integrate predictions of different tools to provide more reliable structural variations. methodsmerge(..., others=null, overlapperdel = 0.8, overlapperdup = 0.8, overlapperinv = 0.8, nummethodssupdel = 2, nummethodssupdup = 2, nummethodssupinv = 2)... results of different SVs predictions read in to R by intansv. others overlapperdel overlapperdup overlapperinv a data frame of structural variations predicted by other tools. Deletions predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together Duplications predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together Inversions predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together nummethodssupdel Deletion clusters supportted by no more than this threshold of read support would be discarded nummethodssupdup Duplication clusters supportted by no more than this threshold of read support would be discarded nummethodssupinv Inversion clusters supportted by no more than this threshold of read support would be discarded A structural variation (etion, duplication, inversion et al.) may be reported by different tools. However, the boundaries of this structural variation predicted by different tools don t always agree with each other. Predictions of different methods with reciprocal overlap more than 80 percent were merged. Structural varions supported by only one method were discarded. dup inv the integrated etions of different methods. the integrated duplications of different methods. the integrated inversions of different methods.

plotchromosome 3 breakdancer <- readbreakdancer(system.file("extdata/zs97.breakdancer.sv", package="intansv")) str(breakdancer) cnvnator <- readcnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator) svseq <- readsvseq(system.file("extdata/svseq2",package="intansv")) str(svseq) ly <- readdelly(system.file("extdata/ly",package="intansv")) str(ly) pin <- readpin(system.file("extdata/pin",package="intansv")) str(pin) sv_all_methods <- methodsmerge(breakdancer,pin,cnvnator,ly,svseq) str(sv_all_methods) sv_all_methods.1 <- methodsmerge(breakdancer,pin,cnvnator,ly,svseq, overlapperdel=0.7) str(sv_all_methods.1) sv_all_methods.2 <- methodsmerge(breakdancer,pin,cnvnator,ly,svseq, overlapperdel=0.8, nummethodssupdel=3) str(sv_all_methods.2) plotchromosome Display the chromosome distribution of structural variations Display the chromosome distribution of structural variations by splitting the chromosomes into windows of specific size and counting the number of structural variations in each window. plotchromosome(genome, structuralvariation, windowsize=1000000) genome A data frame with ID and length of all Chromosomes. structuralvariation A list of structural variations. windowsize A specific size (in base pair) to split chromosomes into windows.

4 plotregion To visualize the distribution of structural variations in the whole genome, chromosomes were splitted into windows of specific size (default 1 Mb) and the number of structural variations in each window were counted. The number of structural variations were shown using circular barplot. A circular plot with five layers: the circular view of genome ideogram. the chromosome coordinates labels. the circular barplot of number of etions in each chromosome window. the circular barplot of number of duplications in each chromosome window. the circular barplot of number of inversions in each chromosome window. ly <- readdelly(system.file("extdata/ly",package="intansv")) str(ly) genome.file.path <- system.file("extdata/chr05_chr10.genome.txt", package="intansv") genome <- read.table(genome.file.path, head=true, as.is=true) str(genome) plotchromosome(genome,ly,1000000) plotregion Display structural variations in a specific genomic region Display the structural variations in a specific genomic region in circular view. plotregion(structuralvariation, genomeannotation, regionchromosome, regionstart, regionend)

readbreakdancer 5 structuralvariation A list of structural variations. genomeannotation A data frame of genome annotations. regionchromosome The chromosome identifier of a specific region to view. regionstart regionend The start coordinate of a specific region to view. The end coordinate of a specific region to view. Different SVs were shown as rectangles in different layers. See the package vignette and the example dataset for more details. A circular plot of all the structural variations and genes in a specific region with four layers: The composition of genes of a specific genomic region. The composition of etions of a specific genomic region. The composition of duplications of a specific genomic region. The composition of inversions of a specific genomic region. ly <- readdelly(system.file("extdata/ly",package="intansv")) str(ly) anno.file.path <- system.file("extdata/chr05_chr10.anno.txt", package="intansv") msu_gff_v7 <- read.table(anno.file.path, head=true, as.is=true) str(msu_gff_v7) plotregion(ly,msu_gff_v7,"chr05",1,200000) readbreakdancer Read in the structural variations predicted by breakdancer Reading in the structural variations predicted by breakdancer, filtering low quality predictions and merging overlapping predictions.

6 readbreakdancer readbreakdancer(file="", scorecutoff=60, readssupport=3, regsizelowercutoff=100, regsizeuppercutoff=10000000, method="breakdancer",...) file scorecutoff the output file of breakdancer. the minimum score for a structural variation to be read in. readssupport the minimum read pair support for a structural variation to be read in. regsizelowercutoff the minimum size for a structural variation to be read in. regsizeuppercutoff the maximum size for a structural variation to be read in. method a tag to assign to the result of this function.... parameters passed to read.table. The predicted SVs could be further filtered by score, number of read pairs supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. inv the etions predicted by breakdancer. the inversions predicted by breakdancer. breakdancer <- readbreakdancer(system.file("extdata/zs97.breakdancer.sv", package="intansv")) str(breakdancer)

readcnvnator 7 readcnvnator Read in the structural variations predicted by CNVnator Reading the structural variations predicted by CNVnator, filtering low quality predictions and merging overlapping predictions. readcnvnator(datadir=".", regsizelowercutoff=100, regsizeuppercutoff=1000000, method="cnvnator") datadir the directory that contain the output files of CNVnator. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. method a tag to assign to the result of this function. The predicted SVs could be further filtered by the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "datadir" should only contain the predictions of CNVnator. See the example dataset for more details. dup the etions predicted by CNVnator. the duplications predicted by CNVnator. cnvnator <- readcnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator)

8 readdelly readdelly Read in the structural variations predicted by DELLY Reading the structural variations predicted by DELLY, filtering low quality predictions and merging overlapping predictions. readdelly(datadir=".", regsizelowercutoff=100, regsizeuppercutoff=10000000, readssupport=3, method="delly", pass=true, minmappingquality=20) datadir a directory containing the prediction results of DELLY. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. readssupport the minimum read pair support for a structural variation to be read. method a tag to assign to the result of this function. pass set pass=true to remove LowQual SVs reported by DELLY. minmappingquality the minimum mapping quality for a SV to be read. The predicted SVs could be further filtered by the number of read pairs supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "datadir" should only contain the predictions of DELLY. See the example dataset for more details. dup inv the etions predicted by DELLY. the duplications predicted by DELLY. the inversions predicted by DELLY. ly <- readdelly(system.file("extdata/ly",package="intansv")) str(ly)

readlumpy 9 readlumpy Read in the structural variations predicted by Lumpy Reading the structural variations predicted by Lumpy, filtering low quality predictions and merging overlapping predictions. readlumpy(file="", regsizelowercutoff=100, regsizeuppercutoff=1000000, readssupport=3, method="lumpy",...) file the file containing the prediction results of Lumpy. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. readssupport method the minimum read pair support for a structural variation to be read. a tag to assign to the result of this function.... parameters passed to read.table. The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. dup inv the etions predicted by Lumpy. the duplications predicted by Lumpy. the inversions predicted by Lumpy. lumpy <- readlumpy(system.file("extdata/zs97.lumpy.pesr.bedpe",package="intansv")) str(lumpy)

10 readpin readpin Read in the structural variations predicted by Pin Reading the structural variations predicted by Pin, filtering low quality predictions and merging overlapping predictions. readpin(datadir=".", regsizelowercutoff=100, regsizeuppercutoff=1000000, readssupport=3, method="pin") datadir the directory containing the prediction results of Pin. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. readssupport method the minimum read pair support for a structural variation to be read. a tag to assign to the result of this function. The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "datadir" should only contain the predictions of Pin. The etions output files should be named using the suffix "_D", the duplications output files should be named using the suffix "_TD", and the inversions output files should be named using the suffix "_INV". See the example dataset for more details. dup inv the etions predicted by Pin. the duplications predicted by Pin. the inversions predicted by Pin. pin <- readpin(system.file("extdata/pin",package="intansv")) str(pin)

readsoftsearch 11 readsoftsearch Read in the structural variations predicted by SoftSearch Reading the structural variations predicted by SoftSearch, filtering low quality predictions and merging overlapping predictions. readsoftsearch(file="", regsizelowercutoff=100, readssupport=3, method="softsearch", regsizeuppercutoff=1000000, softclipssupport=3,...) file the file containing the prediction results of SoftSearch. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. readssupport the minimum read pair support for a structural variation to be read. method a tag to assign to the result of this function. softclipssupport the minimum soft clip support for a structural variation to be read.... parameters passed to read.table The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. dup inv the etions predicted by SoftSearch. the duplications predicted by SoftSearch. the inversions predicted by SoftSearch. softsearch <- readsoftsearch(system.file("extdata/zs97.softsearch",package="intansv")) str(softsearch)

12 readsvseq readsvseq Read in the structural variations predicted by SVseq2 Reading the structural variations predicted by SVseq2, filtering low quality predictions and merging overlapping predictions. readsvseq(datadir=".", regsizelowercutoff=100, method="svseq2", regsizeuppercutoff=1000000, readssupport=3) datadir a directory containing the predictions of SVseq2. regsizelowercutoff the minimum size for a structural variation to be read. regsizeuppercutoff the maximum size for a structural variation to be read. readssupport method the minimum read pair support for a structural variation to be read. a tag to assign to the result of this function. The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "datadir" should only contain the predictions of SVseq2. The etions output files should be named using the suffix ".". See the example dataset for more details. the etions predicted by SVseq2. svseq <- readsvseq(system.file("extdata/svseq2",package="intansv")) str(svseq)

svannotation 13 svannotation Annotation of structural variations Annotate the effect caused by structural variations to genes and elements of genes. svannotation(structuralvariation,genomeannotation) structuralvariation A data frame of structural variations. genomeannotation A data frame of genome annotations. A structural variation (etion, duplication, inversion et al.) could affect the structure of a specific gene, including etion of introns/exons, etion of whole gene, et al.. This function gives the detailed effects caused by structural variations to genes and elements of genes. The parameter "structuralvariation" should be a data frame with three columns: chromosome the chromosome of a structural variation. pos1 the start coordinate of a structural variation. pos2 the end coordinate of a structural variation. A data frame with the following columns: chromosome pos1 pos2 size info tag start end strand ID the chromosome of a structural variation. the start coordinate of a structural variation. the end coordinate of a structural variation. the size of a structural variation. information on a structural variation. the tag of a genomic element. the start coordinate of a genomic element. the end coordinate of a genomic element. the strand of a genomic element. the ID of a genomic element.

14 svannotation breakdancer <- readbreakdancer(system.file("extdata/zs97.breakdancer.sv", package="intansv")) str(breakdancer) msu_gff_v7 <- read.table(system.file("extdata/chr05_chr10.anno.txt", package="intansv"), head=true, as.is=true, sep="\t") breakdancer.anno <- llply(breakdancer,svannotation, genomeannotation=msu_gff_v7) str(breakdancer.anno)

Index methodsmerge, 2 plotchromosome, 3 plotregion, 4 readbreakdancer, 5 readcnvnator, 7 readdelly, 8 readlumpy, 9 readpin, 10 readsoftsearch, 11 readsvseq, 12 svannotation, 13 15