Chapter 6 Compare the selected comparison

Once you have selected a clustering algorithm and achieved a stable configuration that suits the purpose of your study, you still need to choose the optimal number of clusters (k) for your partition. To help with this, we have included the Config Comparison tab. Here, you can visualize your current configuration, represented by four UMAP plots, two for each comparison, that can be color-coded by ECC stability, number of clusters, or any available metadata feature, and compare it to any other possible configurations in your dataset.

We also include the option for the user to add various metadata

6.1 UMAPs

Your selected configuration is displayed on the left panels, where you can vary the number of clusters and the color scheme of the plots. These can be color-coded by any available metadata feature, ECC stability, or the number of clusters. The resulting plots can be downloaded in various formats for further analysis or presentation.

Main Plots

UMAP plots for comparison between the selected configuration at a varying number of clusters and any other.

Figure 6.1: UMAP plots for comparison between the selected configuration at a varying number of clusters and any other.

6.2 Jaccard Simmilarity Index (JSI)/Cells per cluster

In addition to the visualizations available in the Config Comparison tab, ClustAssess also allows you to explore how cells behave across different comparisons using heatmaps. Specifically, we calculate the Jaccard Similarity Index (JSI) between your selected clustering configuration and the one you wish to compare it against. Higher JSI values indicate greater similarity in cell assignments to specific clusters.

The heatmap can be visualized either in terms of the JSI values or the number of cells per cluster in each configuration. While JSI provides a more granular view of cell behavior, comparing the sizes of clusters in different configurations can help distinguish between larger and smaller groups of cells.

Basic Plot

Information displayed

Figure 6.2: Information displayed

Info

Information displayed

Figure 6.3: Information displayed

Download options

Download options

Figure 6.4: Download options

Customisation

Option to toggle between JSI and number of cells per cluster

Figure 6.5: Option to toggle between JSI and number of cells per cluster

6.3 Violin Plots

We also Provide tools to visualize particular metadata categories against different number of clusters, as well as other options such as gene expression.

Basic Plot

Violin plots of Numbers of features per cluster

Figure 6.6: Violin plots of Numbers of features per cluster

Group By features

Violin plots for SOX5, grouped by sex

Figure 6.7: Violin plots for SOX5, grouped by sex

6.4 Gene Expression Heatmap / Bubbleplot

The expression gene or gene set can be visualised in a heatmap/bubbleplot across different metadata categories.

Differentially expressed genes between two clusters

Figure 6.8: Differentially expressed genes between two clusters

6.5 Marker Identification

To further explore your selected clustering configuration, ClustAssess provides tools for inferring gene markers and trajectories. For marker identification, we use a modified, faster implementation of Seurat’s inbuilt Wilcoxon rank sum test to detect genes that are differentially expressed between any two conditions. These conditions can be metadata categories or specific clusters, helping you identify genes that are uniquely expressed in certain cell populations. You can visualise the genes as well as export them as a csv file. Individual clusters can be compared, as well as differences between sets of clusters, or metadata categories.

Differentially expressed genes between two clusters

Figure 6.9: Differentially expressed genes between two clusters

6.6 Enrichment Analysis

We have also added the option for the user to perform a gene set enrichment analysis given a set of markers from the previous step. Here, the user can test enrichment of markers on several different data sources, and is able to choose the top n genes, ordered by average log2FC.

Resuts from the Gene set enrichment analysis

Figure 6.10: Resuts from the Gene set enrichment analysis