This function is used to filter the gene table (usually created with cast_gtf_to_genes), only keeping genes above the noise thresholds. It uses as input the gene table (usually containing individual exons), an expression matrix for each of these and a vector of abundance thresholds. This function is used internally by remove_noise_from_bams to determine which genes to retain.

filter_genes_transcript(
  genes,
  expression.matrix,
  noise.thresholds,
  filter.by = c("gene", "exon"),
  ...
)

Arguments

genes

a tibble of the exons extracted from the gtf file; (usually the the output of cast_gtf_to_genes)

expression.matrix

the expression matrix, usually calculated by calculate_expression_similarity_transcript

noise.thresholds

a vector of expression thresholds by sample

filter.by

Either "gene" (default) or "exon"; if filter.by="gene", a gene (as determined by its ENSEMBL id) is removed if and only if all of its exons are below the corresponding noise thresholds; if filter.by="exon", then each exon is individually removed if it is below the corresponding noise thresholds.

...

arguments passed on to other methods

Value

Returns a filtered tibble of exons, with the noise removed.

Examples

bams <- rep(system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE), 2) genes <- data.frame("id" = 1:2, "gene_id" = c("gene1", "gene2"), "seqid" = c("seq1", "seq2"), "start" = 1, "end" = 1600) noise.thresholds <- c(0, 1) expression.summary = calculate_expression_similarity_transcript( bams = bams, genes = genes, mapq.unique = 99 )
#> Calculating expression similarity for 2 genes...
#> this process may take a long time...
#> ncores=1, running sequentially...
#> Finished! Time elapsed: 0.31 secs
filter_genes_transcript( genes = genes, expression.matrix = expression.summary$expression.matrix, noise.thresholds = noise.thresholds, )
#> filtering genes using the noise thresholds
#> doing filtering by gene
#> kept 2 entries out of 2
#> id gene_id seqid start end #> 1 1 gene1 seq1 1 1600 #> 2 2 gene2 seq2 1 1600