Chapter 5 Graph Clustering

The final step in a standard single-cell analysis pipeline is applying a graph-based clustering method. Choosing a community detection algorithm has a significant impact on the partitioning results. We assess the stability and reproducibility of results obtained using various graph clustering methods available in the Seurat package: Louvain, Louvain refined, SLM and Leiden.

As before, the stability of the methods is evaluated using the Element-Centric Consistency (ECC), which is applied on the partition list obtained over multiple runs.

The initial plot can be visualised in two ways; clustering consistency with increasing number of clusters, or consistency as the resolution parameter increases. Increasing the resolution leads to more clusters; in addition, we note consistently high stability for all four algorithms.

Boxplots showing the distribution of the resolution-wise stability for different clustering algorithms

Figure 5.1: Boxplots showing the distribution of the resolution-wise stability for different clustering algorithms

5.1 Overall Stability

These two plots represent a summary of the previous two plots, summarised per clustering algorithm. Similarly to the first stability summaries, we extract the medians from each resolution value and we get the distribution that we plot on the overall stabilibty.
Boxplots showing the combination previous two plots, summarised per clustering algorithm.

Figure 5.2: Boxplots showing the combination previous two plots, summarised per clustering algorithm.

5.2 Correspondence between the resolution value and the number of clusters

Here, we showcase the relationship between the number of clusters and the resolution value. This plot also provides information on suitable resolution values for predefined number of clusters. The colour gradient represents either the frequency of the partitions having k clusters or the ECC of them. It can also be used as proxy to describe the co-variation between (k, resolution). Lighter (higher) values indicate that little variation is observed, on changes of the random seed, between the number of clusters and the resolution value. The size illustrates the frequency of the most common partition when the resolution and the number of clusters values are fixed.

Correspondence between the resolution value and the number of clusters.

Figure 5.3: Correspondence between the resolution value and the number of clusters.

5.3 The stability of the number of clusters

The following plot showcases the co-variation between the stability of the number of clusters and the number of different partitions resulting from changes on the seed or the resolution parameter. A high number of different partitions indicates a lower stability for a given number of clusters. The colour gradient is proportional to the frequency of most common partition having a fixed number of clusters or their ECC as an indicator of robustness. The size indicates the frequency of the partition having k clusters relative to the total number of runs and should provide additional information whether the behaviour described by the colour is replicated in multiple instances or not. We note that, even if we obtain a high number of different partitions, if the frequency of the most common one is close to 1, then the overall stability is high. Observing a high number of partitions, each with low frequency, indicates high instability.

Stability values for varying number of clusters

Figure 5.4: Stability values for varying number of clusters

5.4 Fixing a clustering method

Finally, at the end of this tab, you will have the opportunity to select a clustering method based on all of the results presented above; you should chose the method that shows greater consistency across evaluations. Once this has been selected, you are free to move to the configuration Comparison tab.

Options to fix a clustering method. You are also prompted to select a number of clusters that you consider stable and useful for comparison.

Figure 5.5: Options to fix a clustering method. You are also prompted to select a number of clusters that you consider stable and useful for comparison.