Chapter 4 Graph construction

The next step in a standard single-cell analysis pipeline is building the graph using the nearest neighbour algorithm. The following parameters influence the final partitioning:

base embedding: the graph can be built on either the PCA or the UMAP embedding (using the expression matrix isn’t recommended, as the distances would be noisier and the runtime would increase)
the number of neighbours
the graph type: the graph can be either unweighted (NN case) or based on a weighted Shared-Nearest Neighbours (SNN) graph. For the latter, the weights are computed using the Jaccard Similarity Index (JSI) between the neighbourhoods of two cells.

4.1 Relationship between the number of neighbours and the number of connected components

Once a feature set has been selected, you can visualise the covariation between the number neighbours and the number of connected components (a connected component is a subgraph within which there exists a path between every pair of nodes) obtained using both PCA and UMAP reductions as base for graph building. As the number of neighbours increases, the number of connected components decreases (this is an expected result, as increasing the number of neighbours result in a better connected graph). Please note that the number of connected components provides a lower bound on the number of clusters we can obtain by downstream community detection algorithms such as Louvain and Leiden. In this case, increasing the number of neighbours does not have an effect on the overall number of connected components.

Basic Plot

Figure 4.1: Basic plot for the relationship between the Number of Neighbours (NN) and the number of connected components

Information

Figure 4.2: Information displayed

Download options

Figure 4.3: Download options

Customisation

Figure 4.4: Colour scheme options

4.2 Relationship between the number of neighbours and the number of clusters

The second plot for this tab explores the effect of the number of neighbours on the number of clusters (k). You can also visualise this effect across different graph types.

Basic Plot

Figure 4.5: Basic plot for the Relationship between the number of neighbours and the number of clusters (k)

Hover/Click options

Figure 4.6: Extra information displayed upon hovering/clicking

Info

Figure 4.7: Information displayed

Download options

Figure 4.8: Download options

Customisation

Figure 4.9: Colour scheme and plot customisation options

4.3 Overall Stability across Number of Neighbours

The final plot in the graph construction tab allows you to look into the effect of the random seed on the stability of the partitions. More stable partitions will exhibit greater ECC values with small variations at different seeds. Hovering over the plot will allow you obtain information on a specific region, and clicking will display a UMAP for that partition, coloured by the stability across random seeds.

Basic Plot

Figure 4.10: Basic plot for the Overall Stability across Number of Neighbours

Hover/Click options

Figure 4.11: Extra information displayed upon hovering/clicking

Info

Figure 4.12: Information displayed

Download options

Figure 4.13: Download options

Customisation

Figure 4.14: Colour scheme and plot customisation options