Function to filter the gene table for the transcript approach

This function is used to filter the gene table (usually created with cast_gtf_to_genes), only keeping genes above the noise thresholds. It uses as input the gene table (usually containing individual exons), an expression matrix for each of these and a vector of abundance thresholds. This function is used internally by remove_noise_from_bams to determine which genes to retain.

filter_genes_transcript(
  genes,
  expression.matrix,
  noise.thresholds,
  filter.by = c("gene", "exon"),
  ...
)

Arguments

genes	a tibble of the exons extracted from the gtf file; (usually the the output of `cast_gtf_to_genes`)
expression.matrix	the expression matrix, usually calculated by `calculate_expression_similarity_transcript`
noise.thresholds	a vector of expression thresholds by sample
filter.by	Either "gene" (default) or "exon"; if filter.by="gene", a gene (as determined by its ENSEMBL id) is removed if and only if all of its exons are below the corresponding noise thresholds; if filter.by="exon", then each exon is individually removed if it is below the corresponding noise thresholds.
...	arguments passed on to other methods

Value

Returns a filtered tibble of exons, with the noise removed.

Examples

bams <- rep(system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE), 2)
genes <- data.frame("id" = 1:2,
                    "gene_id" = c("gene1", "gene2"),
                    "seqid" = c("seq1", "seq2"),
                    "start" = 1,
                    "end" = 1600)
noise.thresholds <- c(0, 1)
expression.summary = calculate_expression_similarity_transcript(
  bams = bams,
  genes = genes,
  mapq.unique = 99
)
#> Calculating expression similarity for 2 genes...
#>     this process may take a long time...
#>     ncores=1, running sequentially...
#> Finished! Time elapsed: 0.31 secs
filter_genes_transcript(
    genes = genes,
    expression.matrix = expression.summary$expression.matrix,
    noise.thresholds = noise.thresholds,
)
#>   filtering genes using the noise thresholds
#>     doing filtering by gene
#>   kept 2 entries out of 2
#>   id gene_id seqid start  end
#> 1  1   gene1  seq1     1 1600
#> 2  2   gene2  seq2     1 1600