R/stability-based-parameter-assessment.R
get_feature_stability.Rd
Evaluate the stability of clusterings obtained based on incremental subsets of a given feature set.
get_feature_stability(
data_matrix,
feature_set,
steps,
feature_type,
n_repetitions = 100,
seed_sequence = NULL,
graph_reduction_type = "PCA",
npcs = 30,
ecs_thresh = 1,
ncores = 1,
algorithm = 4,
...
)
A data matrix having the features on the rows and the observations on the columns.
A set of feature names that can be found on the rownames of the data matrix.
Vector containing the sizes of the subsets; negative values will be interpreted as using all features.
A name associated to the feature_set.
The number of repetitions of applying the pipeline with different seeds; ignored if seed_sequence is provided by the user.
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100.
The graph reduction type, denoting if the graph should be built on either the PCA or the UMAP embedding.
The number of principal components.
The ECS threshold used for merging similar clusterings.
The number of parallel R instances that will run the code. If the value is set to 1, the code will be run sequentially.
An index indicating which community detection algorithm will
be used: Louvain (1), Louvain refined (2), SLM (3) or Leiden (4). More details
can be found in the Seurat's FindClusters
function.
additional arguments passed to the umap method.
A list having one field associated with a step value. Each step contains a list with three fields:
ecc - the EC-Consistency of the partitions obtained on all repetitions
embedding - one UMAP embedding generated on the feature subset
most_frequent_partition - the most common partition obtained across repetitions
The algorithm assumes that the feature_set is already sorted when performing the subsetting. For example, if the user wants to analyze highly variable feature set, they should provide them sorted by their variability.
set.seed(2021)
# create an artificial expression matrix
expr_matrix = matrix(c(runif(100*10), runif(100*10, min = 3, max = 4)), nrow = 200, byrow = TRUE)
rownames(expr_matrix) = as.character(1:200)
colnames(expr_matrix) = paste("feature", 1:10)
feature_stability_result = get_feature_stability(data_matrix = t(expr_matrix),
feature_set = colnames(expr_matrix),
feature_type = "feature_name",
steps = 5,
npcs = 2,
n_repetitions = 10,
algorithm = 1,
# the following parameters are used by the umap function and are not mandatory
n_neighbors = 3,
approx_pow = TRUE,
n_epochs = 0,
init = "random",
min_dist = 0.3)
plot_feature_stability_boxplot(feature_stability_result)