Chapter 4 Graph construction
The next step in a standard single-cell analysis pipeline is building the graph using the nearest neighbour algorithm. The following parameters influence the final partitioning:
base embedding: the graph can be built on either the PCA or the UMAP embedding (using the expression matrix isn’t recommended, as the distances would be noisier and the runtime would increase)
the number of neighbours
the graph type: the graph can be either unweighted (NN case) or based on a weighted Shared-Nearest Neighbours (SNN) graph. For the latter, the weights are computed using the Jaccard Similarity Index (JSI) between the neighbourhoods of two cells.
4.1 Relationship between the number of neighbours and the number of connected components
Once a feature set has been selected, you can visualise the covariation between the number neighbours and the number of connected components (a connected component is a subgraph within which there exists a path between every pair of nodes) obtained using both PCA and UMAP reductions as base for graph building. As the number of neighbours increases, the number of connected components decreases (this is an expected result, as increasing the number of neighbours result in a better connected graph). Please note that the number of connected components provides a lower bound on the number of clusters we can obtain by downstream community detection algorithms such as Louvain and Leiden. In this case, increasing the number of neighbours does not have an effect on the overall number of connected components.
4.2 Relationship between the number of neighbours and the number of clusters
The second plot for this tab explores the effect of the number of neighbours on the number of clusters (k). You can also visualise this effect across different graph types.
4.3 Overall Stability across Number of Neighbours
The final plot in the graph construction tab allows you to look into the effect of the random seed on the stability of the partitions. More stable partitions will exhibit greater ECC values with small variations at different seeds. Hovering over the plot will allow you obtain information on a specific region, and clicking will display a UMAP for that partition, coloured by the stability across random seeds.