This function is used to remove the noisy reads from the BAM files. It uses as input the BAM file names, a gene table (usually containing individual exons, made using cast_gtf_to_genes), an expression matrix for each of these genes and a vector of abundance thresholds.

remove_noise_from_bams(
  bams,
  genes,
  expression,
  noise.thresholds,
  destination.files = base::paste0(base::basename(bams), ".noisefiltered.bam"),
  filter.by = c("gene", "exon"),
  make.index = FALSE,
  unique.only = TRUE,
  mapq.unique = 255,
  ...
)

Arguments

bams

a character vector of the BAM file names

genes

a tibble of the exons extracted from the gtf file; (usually the the output of cast_gtf_to_genes)

expression

the expression matrix or expression summary list, as calculated by calculate_expression_similarity_transcript

noise.thresholds

a vector of expression thresholds by sample; must be the same length as the number of BAM files, or a singular value to be used as a fixed noise threshold

destination.files

names for the output denoised BAM files; by default the same as the original files, appended with ".noisefiltered.bam", but created in the working directory

filter.by

Either "gene" (default) or "exon"; if filter.by="gene", a gene is removed from all BAM files if and only if all of its exons are below the corresponding noise thresholds; if filter.by="exon", then each exon is individually removed (from all samples) if it is below the corresponding noise thresholds.

make.index

whether a BAM index should be generated; if this is FALSE (the default) and no index exists, the function will exit with an error; the index needs to have the same name as each BAM file, but ending with .bam.bai

unique.only

whether only uniquely mapped reads should contribute to the expression of a gene/exon; default is TRUE

mapq.unique

The values of the mapping quality field in the BAM file that corresponds to uniquely mapped reads; by default, values of 255 are used as these correspond to the most popular aligners, but an adjustment might be needed; the mapq scores should be as follows: 255 for STAR, 60 for hisat2, 255 for bowtie in -k mode, 40 for bowtie2 default, 50 for tophat

...

arguments passed on to other methods

Value

Returns a matrix of the same dims as the expression matrix, with the noise removed. This matrix has no entries remaining below the noise threshold.

See also

Examples

bams <- rep(system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE), 2) genes <- data.frame("id" = 1:2, "gene_id" = c("gene1", "gene2"), "seqid" = c("seq1", "seq2"), "start" = 1, "end" = 1600) noise.thresholds <- c(0, 1) expression.summary = calculate_expression_similarity_transcript( bams = bams, genes = genes, mapq.unique = 99 )
#> Calculating expression similarity for 2 genes...
#> this process may take a long time...
#> ncores=1, running sequentially...
#> Finished! Time elapsed: 0.32 secs
remove_noise_from_bams( bams = bams, genes = genes, expression = expression.summary, noise.thresholds = noise.thresholds, destination.files = paste0(tempdir(), "/", basename(bams), ".noisefiltered.bam"), mapq.unique = 99 )
#> Denoising 2 BAM files...
#> filtering genes using the noise thresholds
#> doing filtering by gene
#> kept 2 entries out of 2
#> denoising BAM file 1 of 2
#> denoising BAM file 2 of 2