expect_equal
for comparing numerical valuesTransformCatalog
in case R was configured and built in a way that did not support long double.Updated documentation of ReadCatalog
and ReadCatalogInternal
as there are no ID96 catalogs in COSMIC v3.2.
Changed the URL of COSMIC mutational signatures page to the redirected URL.
Updated some tests for TransformCatalog
in case R was configured and built in a way that did not support long double.
Added the argument strict
back to ReadCatalog
for backward compatibility; strict
is now ignored and deprecated.
Robustified function StandardChromNameNew
to select the column which contains chromosome names by name instead of column index.
Fixed a bug in function CheckSeqContextInVCF
.
Fixed a bug in function PlotCatalog.SBS96Catalog
when plotting the X axis after setting par(tck) = 0
.
Changed PlotCatalog
to round the mutation counts for each main type for SBS96, SBS192, DBS78 and ID counts catalog in case the input is reconstructed counts catalog.
Updated function AdjustNumberOfCores
not to throw a message on MS Windows machine.
Added an additional argument ylabels
to PlotCatalog
and PlotCatalogToPdf
. When ylabels = FALSE
, don’t plot the y axis labels. Implemented for SBS96Catalog, DBS78Catalog, IndelCatalog.
Enabled argument grid
, uppder
, xlabels
in PlotCatalog
and PlotCatalogToPdf
for DBS78Catalog, IndelCatalog.
ReadCatalog
to import files with:
ReadCatalog
function, e.g. ReadCatalog.SBS96Catalog
. Now they are in data-raw/obsolete-files/ReadCatalogMethods.R
.ConvCatalogToICAMS
to convert SigProfiler/COSMIC-formatted catalog files into ICAMS catalog objects. Now these functions are in data-raw/obsolete-files/ConvCatalogToICAMS.R
, and their functionalities are integrated into ReadCatalog
.ReadCatalog
to remove rows which have NA in the data table read in. Otherwise the number of rows will not be accurate to infer the correct catalog type.InferClassOfCatalogForRead
to data-raw/obsolete-files/InferClassOfCatalogForRead.R
.CreateOneColDBSMatrix
when returning 1-column DBS144 matrix with all values being 0 and the correct row labels.Added an additional argument tmpdir
in function AddRunInformation
.
Updated function CheckAndRemoveDiscardedVariants
and MakeDataFrameFromVCF
to check for variants that have same REF and ALT.
Create new temp directory when generating zip archive from VCFs to avoid zipping unnecessary files in the output.
Fixed a bug in function AddRunInformation
for allowing ref.genome
to be a Bioconductor package.
Fixed bugs in function CreateOneColSBSMatrix
, CreateOneColDBSMatrix
and CreateOneColIDMatrix
when the variants in the input vcfs should all be discarded.
Updated function CheckAndFixChrNames
to give a warning instead of an error when “23” and “X” or “24” and “Y” appear on the chromosome names on the VCF at the same time. CheckAndFixChrNames
will change “23” to “X” or “24” to “Y” internally for downstream processing.
Changed some code in function AddTranscript
, CreateOneColSBSMatrix
, CreateOneColDBSMatrix
to use functions from package dplyr
instead of data.table
due to segfault error.
RemoveRowsWithDuplicatedCHROMAndPOSNew
to remove variants that have same CHROM, POS, REF.files
in function VCFsToZipFile
.Fixed a bug in ReadAndSplitVCFs
for merging adjacent SBSs into DBS when variant.caller
is mutect
.
Fixed a bug inCheckAndRemoveDiscardedVariants
for removing wrong DBS variants.
CheckAndRemoveDiscardedVariants
to remove wrong DBS variants that have same base in the same position in REF and ALT (e.g. TA > TT or GT > CT).name.of.VCF
in function MakeDataFrameFromVCF
for better error reporting.Updated function MakeDataFrameFromVCF
for better error reporting when reading in files that are actually not VCFs.
Updated function ReadVCFs
to automatically change the number of cores to 1 on Windows instead of throwing an error.
CheckAndFixChrNames
for returning the correct number of chromosome names.stop.on.error
and code tryCatch
in function VCFsToCatalogs
for better tracing if the function stops on error.Added argument stop.on.error
to VCFsToCatalogs
; if false, return list with single element named error.
Added new internal function CheckAndFixChrNamesForTransRanges
. The chromosome names in exported data TranscriptRanges
don’t have “chr”. ICAMS now will check for the chromosome names format in input vcf and update the trans.ranges chromosome names in function AddTranscript
if needed.
Added new argument name.of.VCF
in function AnnotateSBSVCF
and AnnotateDBSVCF
for better error reporting.
Changed return from ReadCatalog
to include possible attribute “error” and allow for not calling stop() on error.
For a stranded catalog, as.catalog
and ReadCatalog
will silently convert region = “genome” to “transcript”.
Updated function AddTranscript
to check whether the format of VCF chromosome names is consistent with that in trans.ranges
used.
Removed documentation warnings related to
Some file reorganization.
CreateOneColSBSMatrix
for showing message that SBS variant whose reference base in ref.genome does not match the reference base in the VCF file.Enabled functions PlotCatalog
and PlotCatalogToPdf
to plot a numeric matrix, numeric data.frame, or a vector denoting the mutation counts.
Added new internal function AdjustNumberOfCores
to change the number of cores automatically to 1 if the operating system is Windows.
Added test processing VCF with unknown variant caller.
Added new internal function SplitSBSVCF
, SplitOneVCF
, SplitListOfVCFs
and VCFsToZipFileXtra
, WriteSBS96CatalogAsTsv
, ReadSBS96CatalogFromTsv
, GetConsensusVAF
.
Added new exported function ReadAndSplitVCFs
, VCFsToCatalogs
, VCFsToCatalogsAndPlotToPdf
and VCFsToZipFile
.
Added new argument filter.status
and get.vaf.function
in functions ReadVCF
, ReadVCFs
, ReadAndSplitVCFs
, VCFsToCatalogs
, VCFsToCatalogsAndPlotToPdf
and VCFsToZipFile
.
Added a new internal data catalog.row.headers.SBS.96.v1
.
Added new argument max.vaf.diff
in internal functions SplitOneVCF
, SplitListOfVCFs
and exported functions ReadAndSplitVCFs
, VCFsToCatalogs
, VCFsToCatalogsAndPlotToPdf
and VCFsToZipFile
.
Added new dependency package parallel
.
Added new dependency package R.utils
for data.table::fread
to read gz and bz2 files directly.
Added new argument num.of.cores
in internal functions ReadVCFs
, SplitListOfVCFs
and exported functions ReadAndSplitVCFs
, VCFsToCatalogsAndPlotToPdf
, VCFsToCatalogs
, VCFsToZipFile
, VCFsToIDCatalogs
, VCFsToSBSCatalogs
, VCFsToDBSCatalogs
.
Added new argument ...
in internal functions ReadVCF
, ReadVCFs
and exported functions ReadAndSplitVCFs
, VCFsToCatalogsAndPlotToPdf
, VCFsToCatalogs
, VCFsToZipFile
.
Added new argument mc.cores
in internal functions GetConsensusVAF
.
MakeDataFrameFromVCF
to use data.table::fread
instead of read.csv
.MakeDataFrameFromVCF
when reading in VCF from URL.Updated function CreateOneColSBSMatrix
to throw a message instead of an error when there are SBS variant whose reference base in ref.genome does not match the reference base in the VCF file.
Updated function MakeVCFDBSdf
to inherit column information from original SBS VCF.
Changed the words in legend for DBS144 plot from “Transcribed”, “Untranscribed” to “Transcribed strand” and “Untranscribed strand”.
Updated the documentation for exported data all.abundance.
Updated function ReadCatalog.COMPOSITECatalog
not to convert “::” to “..” in the column names.
Updated various functions in ICAMS to generate catalogs with zero mutation counts from empty vcfs.
Added a section “ID classification” in the documentation for exported data catalog.row.order
.
New argument suppress.discarded.variants.warnings
in exported function AnnotateIDVCF
with default value TRUE.
Added another paper information in AddRunInformation
. “Characterization of colibactin-associated mutational signature in an Asian oral squamous cell carcinoma and in other mucosal tumor types”, Genome Research 2020 https://doi.org/10.1101/gr.255620.119.
Changed the format of DOIs in DESCRIPTION according to CRAN policy.
Changed back the return value of ReadStrelkaIDVCFs
, ReadStrelkaSBSVCFs
, ReadMutectVCFs
to a list of data frames with no variants discarded.
Combined all the discarded variants from ReadAndSplitMutectVCFs
and ReadAndSplitStrelkaSBSVCFs
under one element discarded.variants
in the return value. An extra column discarded.reason
were added to show the details.
Updated internal functions ReadVCF
and ReadVCFs
not to remove any discarded variants.
No more removal of “chr” in the CHROM
column when reading in VCFs.
CheckAndReturnSBSMatrix
, CheckAndReturnDBSMatrix
, CreateOneColSBSMatrix
,CreateOneColDBSMatrix
, VCFsToSBSCatalogs
, VCFsToDBSCatalogs
.CalculateExpressionLevel
for the edge case.CreateOneColIDMatrix
when the ID.class contains non canonical representation of the ID mutation type.The return value of exported function ReadStrelkaIDVCFs
now sometimes contains a new element, discarded.variants
. This appears when there are variants that were discarded immediately after reading in the VCFs. At present these are variants that have duplicated chromosome/positions and variants that have illegal chromosome names. This means that the user must check the return to see if discarded.variants
is present and remove it before passing the return to a function that expects a list of VCFs. Code in ICAMS that takes lists of VCFs already checks for this element and removes it if present.
Added argument return.annotated.vcfs
to exported function VCFsToIDCatalogs
. The default value for the argument is FALSE to be consistent with other functions.
Argument return.annotated.vcfs
in functions VCFsToSBSCatalogs
,VCFsToDBSCatalogs
, VCFsToIDCatalogs
, MutectVCFFilesToCatalog
, MutectVCFFilesToCatalogAndPlotToPdf
, MutectVCFFilesToZipFile
, StrelkaSBSVCFFilesToCatalog
, StrelkaSBSVCFFilesToCatalogAndPlotToPdf
, StrelkaSBSVCFFilesToZipFile
, StrelkaIDVCFFilesToCatalog
, StrelkaIDVCFFilesToCatalogAndPlotToPdf
and StrelkaIDVCFFilesToZipFile
.
Argument suppress.discarded.variants.warnings
in functions ReadAndSplitMutectVCFs
, ReadAndSplitStrelkaSBSVCFs
, VCFsToSBSCatalogs
,VCFsToDBSCatalogs
, VCFsToIDCatalogs
, MutectVCFFilesToCatalog
, MutectVCFFilesToCatalogAndPlotToPdf
, MutectVCFFilesToZipFile
, StrelkaSBSVCFFilesToCatalog
, StrelkaSBSVCFFilesToCatalogAndPlotToPdf
, StrelkaSBSVCFFilesToZipFile
, StrelkaIDVCFFilesToCatalog
, StrelkaIDVCFFilesToCatalogAndPlotToPdf
and StrelkaIDVCFFilesToZipFile
.
Added documentation to exported functions ReadAndSplitStrelkaSBSVCFs
, StrelkaSBSVCFFilesToCatalog
, StrelkaSBSVCFFilesToCatalogAndPlotToPdf
and StrelkaSBSVCFFilesToZipFile
.
Added information on the “ID classification” in documentation of functions generating ID catalogs, FindDelMH
and FindMaxRepeatDel
.
Minor changes to documentation of functions PlotCatalog
, PlotCatalogToPdf
, StrelkaSBSVCFFilesToZipFile
, StrelkaIDVCFFilesToZipFile
and MutectVCFFilesToZipFile
.
Updated documentation for the return value of functions
StrelkaIDVCFFilesToCatalog
, StrelkaIDVCFFilesToCatalogAndPlotToPdf
, StrelkaIDVCFFilesToZipFile
and VCFsToIDCatalogs
to make it clearer to the user.
Added new exported data of catalog row order for SBS96, SBS1536 and DBS78 in SigProfiler format to catalog.row.order.sp
.
New internal function ConvertICAMSCatalogToSigProSBS96
, ReadVCF
, ReadVCFs
.
New exported function GetFreebayesVAF
for calculating variant allele frequencies from Freebayes VCF.
New test data for Strelka mixed VCF.
Added time zone information to file “run-information.txt” when calling functions MutectVCFFilesToZipFile
, StrelkaSBSVCFFilesToZipFile
and StrelkaIDVCFFilesToZipFile
.
Enabled “counts” -> “counts.signature” catalog transformation when the source catalog has NULL abundance.
Added legend for SBS192 plot and changed the legend text for SBS12 plot.
Added a second element plot.object
to the return list from function PlotCatalog
for catalog types “SBS192Catalog”, “DBS78Catalog”, “DBS144Catalog” and “IndelCatalog”. The second element is a numeric vector giving the coordinates of the bar midpoints, useful for adding to the graph.
Made the returns from PlotCatalog
and PlotCatalogToPdf
invisible.
Improved time performance of GetMutectVAF
, CanonicalizeDBS
, CanonicalizeQUAD
.
if
statements in GetCustomKmerCounts
、 GetStrandedKmerCounts
and GetGenomeKmerCounts
.
CreateOneColIDMatrix
when there is NA ID category.
GetMutectVAF
to check if the VCF is indeed a Mutect VCF.
CreateOneColDBSMatrix
when the VCF does not have any variant in the transcribed region.
CalculatePValues
when there is only a single expression value.
Created an internal function MakeDataFrameFromVCF
to read in data lines of a VCF.
New argument name.of.VCF
in internal function CheckAndFixChrNames
to make the error message more informative.
New argument name.of.VCF
in exported function AnnotateIDVCF
to make the error message more informative.
ReadStrelkaIDVCF
to make the error message more informative.AnnotateIDVCF
to a list. The first element annotated.vcf
contains the annotated VCF. If there are rows that are discarded, the function will generate a warning and a second element discarded.variants
will be included in the returned list.flag.mismatches
deprecated in exported function AnnotateIDVCF
. If there are mismatches to references, the function will automatically discard these rows. User can refer to the element discarded.variants
in the return value for the discarded variants.SplitStrelkaSBSVCF
when there are no non.SBS mutations in the input.MakeDataFrameFromMutectVCF
when a Mutect VCF has no meta-information lines.CreateOneColSBSMatrix
when an annotated SBS VCF has variants on transcribed regions that all fall on transcripts on both strand.CreateOneColDBSMatrix
when an annotated DBS VCF has variants on transcribed regions that all fall on transcripts on both strand.ReadAndSplitStrelkaSBSVCFs
.MutectVCFFilesToZipFile
, StrelkaSBSVCFFilesToZipFile
and StrelkaIDVCFFilesToZipFile
.trans.ranges
to make it optional.name.of.VCF
in internal functions ReadStrelkaSBSVCF
, ReadStrelkaIDVCF
and exported function GetStrelkaVAF
.flag.mismatches
in functions VCFsToIDCatalogs
, MutectVCFFilesToCatalog
, MutectVCFFilesToCatalogAndPlotToPdf
, MutectVCFFilesToZipFile
, StrelkaIDVCFFilesToCatalog
, StrelkaIDVCFFilesToCatalogAndPlotToPdf
and StrelkaIDVCFFilesToZipFile
.GetStrelkaVAF
andGetMutectVAF
to a data frame which contains the VAF and read depth information.PlotCatalogToPdf
a list. The first element is a logical value indicating whether the plot is successful. The second element is a list containing the strand bias statistics (only for SBS192Catalog with “counts” catalog.type and non-NULL abundance and argument plot.SBS12
= TRUE).PlotCatalog
and PlotCatalogToPdf
: For class SBS96Catalog: (New) Allow setting ylim and cex. (New) For PlotCatalog
(not PlotCatalogToPdf
), allow plotting of a 96 x 2 catalog, in which case behavior is a stacked bar chart. (New) Plot x axis tick marks if xlabels
is not TRUE; set par(tck = 0)
to suppress. For class IndelCatalog: (New) Allow setting ylim.GetCustomKmerCounts
.PlotTransBiasGeneExpToPdf
so that ymax on the plot will be changed based on plot.type
.flat.abundance
from “numeric” to “integer”.TransformCatalog
; see documentation for rationale.TransformCatalog
and updated its documentation for parameter target.abundance
.CheckAndFixChrNames
and updated the automated tests.TransformCatalog
.GetMutectVAF
and updated the warning message to make it more informative.cbind
to check the attributes of the incoming catalogs and assign attributes accordingly.TransformCatalog
to check the attributes of the catalog to be transformed in the first place.AnnotateSBSVCF
, AnnotateDBSVCF
and AnnotateIDVCF
.PlotTransBiasGeneExp
and PlotTransBiasGeneExpToPdf
.names.of.VCFs
in functions ReadAndSplitMutectVCFs
, ReadAndSplitStrelkaSBSVCFs
, ReadStrelkaIDVCFs
, MutectVCFFilesToCatalog
, MutectVCFFilesToCatalogAndPlotToPdf
, StrelkaIDVCFFilesToCatalog
, StrelkaIDVCFFilesToCatalogAndPlotToPdf
, StrelkaSBSVCFFilesToCatalog
and StrelkaSBSVCFFilesToCatalogAndPlotToPdf
for users to specify the names of samples in the VCF files.as.catalog
.gene.expression.data.HepG2
and gene.expression.data.MCF10A
.tumor.col.names
in functions ReadAndSplitMutectVCFs
, MutectVCFFilesToCatalog
and MutectVCFFilesToCatalogAndPlotToPdf
to specify the column of the VCF that contains sequencing statistics such as sequencing depth; this column is often called “unknown” in Mutect.MutectVCFFilesToCatalog
, MutectVCFFilesToCatalogAndPlotToPdf
, StrelkaSBSVCFFilesToCatalog
, StrelkaSBSVCFFilesToCatalogAndPlotToPdf
, VCFsToSBSCatalogs
, VCFsToDBSCatalogs
, ReadCatalog
informing the user how to change attributes of the generated catalog.VCFsToIDCatalogs
, StrelkaIDVCFFilesToCatalog
and StrelkaIDVCFFilesToCatalogAndPlotToPdf
a list; 1st element is the spectrum catalog (previously the only return); 2nd element is a list of VCFs with additional annotations.PlotCatalog
a list. The first element is a logical value indicating whether the plot is successful. The second element is a numeric vector giving the coordinates of all the bar midpoints drawn, useful for adding to the graph (only implemented for SBS96Catalog).output.file
argument in MutectVCFFilesToCatalogAndPlotToPdf
, StrelkaSBSVCFFilesToCatalogAndPlotToPdf
, and StrelkaIDVCFFilesToCatalogAndPlotToPdf
so that an indicator of the catalog type plus “.pdf” is simply appended to the base output.file
name. Also made this argument optional with sensible default behavior.trans.ranges.GRCh37
, trans.ranges.GRCh38
and trans.ranges.GRCm38
.FindDelMH
, cryptic repeats (i.e. un-normalized deletions in a repeat such as GAGG deleted from CCCAGGGAGGGTCCC, which should be normalized to a deletion of AGGG) are now ignored with a warning rather than causing a stop
.FindDelMH
, which previously did not flag the cryptic repeat in what is now the second example in the function documentation.as.catalog
supports creation of the catalog from a vector (interpreted as a 1-column matrix) and optionally infers the class from the number of rows in the input.