Evaluates clustering stability when changing the values of different parameters involved in the graph building step, namely the base embedding, the graph type and the number of neighbours.

get_nn_importance(
  object,
  n_neigh_sequence,
  n_repetitions = 100,
  seed_sequence = NULL,
  graph_reduction_type = "PCA",
  ecs_thresh = 1,
  ncores = 1,
  transpose = (graph_reduction_type == "PCA"),
  graph_type = 2,
  algorithm = 4,
  ...
)

Arguments

object

The data matrix. If the graph reduction type is PCA, the object should be an expression matrix, with features on rows and observations on columns; in the case of UMAP, the user could also provide a matrix associated to a PCA embedding. See also the transpose argument.

n_neigh_sequence

A sequence of the number of nearest neighbours.

n_repetitions

The number of repetitions of applying the pipeline with different seeds; ignored if seed_sequence is provided by the user.

seed_sequence

A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100.

graph_reduction_type

The graph reduction type, denoting if the graph should be built on either the PCA or the UMAP embedding.

ecs_thresh

The ECS threshold used for merging similar clusterings.

ncores

The number of parallel R instances that will run the code. If the value is set to 1, the code will be run sequentially.

transpose

Logical: whether the input object will be transposed or not. Set to FALSE if the input is an observations X features matrix, and set to TRUE if the input is a features X observations matrix.

graph_type

Argument indicating whether the graph should be unweighted (0), weighted (1) or both (2).

algorithm

An index indicating which community detection algorithm will be used: Louvain (1), Louvain refined (2), SLM (3) or Leiden (4). More details can be found in the Seurat's FindClusters function.

...

Additional arguments passed to the irlba::irlba or the uwot::umap method, depending on the value of graph_reduction_type.

Value

A list having three fields:

  • n_neigh_k_corresp - list containing the number of the clusters obtained by running the pipeline multiple times with different seed, number of neighbors and graph type (weighted vs unweigted)

  • n_neigh_ec_consistency - list containing the EC consistency of the partitions obtained at multiple runs when changing the number of neighbors or the graph type

  • n_different_partitions - the number of different partitions obtained by each number of neighbors

Examples

set.seed(2021)
# create an artificial expression matrix
expr_matrix = matrix(c(runif(100*10), runif(100*10, min=5, max=6)), nrow = 200)
rownames(expr_matrix) = as.character(1:200)

nn_importance_obj = get_nn_importance(object = expr_matrix,
    n_neigh_sequence = c(10,15,20),
    n_repetitions = 10,
    graph_reduction_type = "PCA",
    algorithm = 1,
    transpose = FALSE, # the matrix is already observations x features, so we won't transpose it
    # the following parameter is used by the irlba function and is not mandatory
    nv = 2)
plot_n_neigh_ecs(nn_importance_obj)