R/filter_genes_transcript.R
filter_genes_transcript.Rd
This function is used to filter the gene table (usually created with
cast_gtf_to_genes
), only keeping genes above the noise thresholds.
It uses as input the gene table (usually containing individual exons),
an expression matrix for each of these and a vector of abundance thresholds.
This function is used internally by remove_noise_from_bams
to determine
which genes to retain.
filter_genes_transcript( genes, expression.matrix, noise.thresholds, filter.by = c("gene", "exon"), ... )
genes | a tibble of the exons extracted from the gtf file;
(usually the the output of |
---|---|
expression.matrix | the expression matrix, usually
calculated by |
noise.thresholds | a vector of expression thresholds by sample |
filter.by | Either "gene" (default) or "exon"; if filter.by="gene", a gene (as determined by its ENSEMBL id) is removed if and only if all of its exons are below the corresponding noise thresholds; if filter.by="exon", then each exon is individually removed if it is below the corresponding noise thresholds. |
... | arguments passed on to other methods |
Returns a filtered tibble of exons, with the noise removed.
bams <- rep(system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE), 2) genes <- data.frame("id" = 1:2, "gene_id" = c("gene1", "gene2"), "seqid" = c("seq1", "seq2"), "start" = 1, "end" = 1600) noise.thresholds <- c(0, 1) expression.summary = calculate_expression_similarity_transcript( bams = bams, genes = genes, mapq.unique = 99 )#>#>#>#>filter_genes_transcript( genes = genes, expression.matrix = expression.summary$expression.matrix, noise.thresholds = noise.thresholds, )#>#>#>#> id gene_id seqid start end #> 1 1 gene1 seq1 1 1600 #> 2 2 gene2 seq2 1 1600