Chapter 6 Compare the selected comparison
Once you have selected a clustering algorithm and achieved a stable configuration that suits the purpose of your study, you still need to choose the optimal number of clusters (k) for your partition. To help with this, we have included the Config Comparison tab. Here, you can visualize your current configuration, represented by four UMAP plots, two for each comparison, that can be color-coded by ECC stability, number of clusters, or any available metadata feature, and compare it to any other possible configurations in your dataset.
We also include the option for the user to add various metadata
6.1 UMAPs
Your selected configuration is displayed on the left panels, where you can vary the number of clusters and the color scheme of the plots. These can be color-coded by any available metadata feature, ECC stability, or the number of clusters. The resulting plots can be downloaded in various formats for further analysis or presentation.
6.2 Jaccard Simmilarity Index (JSI)/Cells per cluster
In addition to the visualizations available in the Config Comparison tab, ClustAssess also allows you to explore how cells behave across different comparisons using heatmaps. Specifically, we calculate the Jaccard Similarity Index (JSI) between your selected clustering configuration and the one you wish to compare it against. Higher JSI values indicate greater similarity in cell assignments to specific clusters.
The heatmap can be visualized either in terms of the JSI values or the number of cells per cluster in each configuration. While JSI provides a more granular view of cell behavior, comparing the sizes of clusters in different configurations can help distinguish between larger and smaller groups of cells.
6.3 Violin Plots
We also Provide tools to visualize particular metadata categories against different number of clusters, as well as other options such as gene expression.
6.4 Gene Expression Heatmap / Bubbleplot
The expression gene or gene set can be visualised in a heatmap/bubbleplot across different metadata categories.
6.5 Marker Identification
To further explore your selected clustering configuration, ClustAssess provides tools for inferring gene markers and trajectories. For marker identification, we use a modified, faster implementation of Seurat’s inbuilt Wilcoxon rank sum test to detect genes that are differentially expressed between any two conditions. These conditions can be metadata categories or specific clusters, helping you identify genes that are uniquely expressed in certain cell populations. You can visualise the genes as well as export them as a csv file. Individual clusters can be compared, as well as differences between sets of clusters, or metadata categories.
6.6 Enrichment Analysis
We have also added the option for the user to perform a gene set enrichment analysis given a set of markers from the previous step. Here, the user can test enrichment of markers on several different data sources, and is able to choose the top n genes, ordered by average log2FC.