Preprocesses data and creates files for SMEW app

This is the main user-facing function to generate a ready-to-use SMEW Shiny app from your spatial metabolomics data. It performs all necessary preprocessing, saves processed data, and writes a static Shiny app (app.R) in the specified output directory. The resulting app enables interactive analysis and visualisation of spatial metabolomics data at bulk, region, and pixel levels.

Usage

create_smew_app(
  intensity_csv,
  metadata_csv,
  output_dir,
  denoise = FALSE,
  anno = NULL,
  adducts = NULL,
  ion_mode = NULL,
  ppm = 10,
  only_annotated = FALSE,
  histology_images_dir = NULL,
  run_autocorrelation = FALSE,
  top_autocorrelated_peaks = 10,
  n_cores = 1,
  run_pixel_enrichment = FALSE,
  multi_modal_path = NULL,
  enrichment_controls = NULL,
  enrichment_comparisons = NULL,
  metabolite_table = NULL,
  pathway_table = NULL,
  pathway_classification = NULL
)

Arguments

intensity_csv: Path to the intensity matrix CSV file. The first column must be pixel IDs, and the remaining columns are m/z features (named as 'mz_<number>' or similar).
metadata_csv: Path to the metadata CSV file. Must contain columns 'pixel_id' (matching intensity matrix), 'x', 'y', and 'Sample'.
output_dir: Directory to save the processed data and the generated app. Will be created if it does not exist.
denoise: Logical; whether to denoise the data (default FALSE).
anno: Optional annotation data.frame or NULL. If provided, should map m/z features to metabolite names and metabolite IDs.
adducts: Optional vector of adducts used for annotation. Options include M-H [1-], M-2H [2-], M-3H [3-], M-H2O-H [1-], M-H+O [1-], M+K-2H [1-], M+Na-2H [1-], M+Cl [1-], M+Cl37 [1-], M+FA-H [1-], M+Hac-H [1-], M+Br [1-], M+Br81 [1-], M+TFA-H [1-], M+ACN-H [1-], M+HCOO [1-], M+CH3COO [1-], 2M-H [1-], 2M+FA-H [1-], 2M+Hac-H [1-], 3M-H [1-], M(C13)-H [1-], M(S34)-H [1-], M(Cl37)-H [1-] for negative mode; and M [1+], M+H [1+], M+2H [2+], M+3H [3+], M+Na [1+], M+2Na [2+], M+3Na [3+], M+H+Na [2+], M+H+2Na [3+], M+2H+Na [3+], M+2Na-H [1+], M+NaCl [1+], M+K [1+], M+H+K [2+], M+ACN+H [1+], M+ACN+2H [2+], M+ACN+Na [1+], M+2ACN+2H [2+], M+3ACN+2H [2+], M+2ACN+H [1+], M+H2O+H [1+], M-H2O+H [1+], M-H4O2+H [1+], M-HCOOH+H [1+], M+HCOONa [1+], M-HCOONa+H [1+], M+HCOOK [1+], M-HCOOK+H [1+], M-CO+H [1+], M-CO2+H [1+], M-C3H4O2+H [1+], M+CH3OH+H [1+], M-NH3+H [1+], M+H+NH4 [2+], M+NH4 [1+], M+IsoProp+H [1+], M+IsoProp+Na+H [1+], M+2K+H [1+], M+DMSO+H [1+], 2M+H [1+], 2M+NH4 [1+], 2M+Na [1+], 2M+3H2O+2H [2+], 2M+K [1+], 2M+ACN+H [1+], 2M+ACN+Na [1+], M(C13)+H [1+], M(C13)+2H [2+], M(C13)+3H [3+], M(S34)+H [1+], M(Cl37)+H [1+] for positive mode.
ion_mode: Ionisation mode ('Negative' or 'Positive'). Used for annotation.
ppm: Numeric; mass accuracy in ppm (default 10). Used for annotation.
only_annotated: Logical; if TRUE, only annotated peaks are used (default FALSE).
histology_images_dir: Optional directory with histology images for overlay and visualisation.
run_autocorrelation: Logical (default: FALSE); whether to run spatial autocorrelation and SVM analysis.
top_autocorrelated_peaks: Integer; number of top autocorrelated peaks to use (default 10).
n_cores: Number of cores for parallel processing (default 1, where processes are run sequentially).
run_pixel_enrichment: Logical (default: FALSE); whether to run pixel-level pathway enrichment.
multi_modal_path: Optional path to multi-modal data file (for multi-omics integration).
enrichment_controls: Optional vector of control samples for enrichment analysis.
enrichment_comparisons: Optional vector of comparison samples for enrichment analysis.
metabolite_table: Optional path to a CSV file with columns MetaboliteID, ExactMass, and MetaboliteName for ID-based annotation.
pathway_table: Optional path to a CSV file with columns PathwayID, PathwayName, and MetaboliteIDs used for ORA.
pathway_classification: Optional path to a CSV file with columns PathwayName, PathwayID, Category1, and Category2 for ORA category overlays.

Value

Invisibly returns the path to the generated app.R file. All processed data and app files are saved in the output directory.

Details

This function:

Checks and validates input files and columns.
Orders metadata to match the intensity matrix.
Maps m/z features to metabolite annotations (if possible).
Optionally filters to only annotated peaks.
Computes grid spacing and transforms coordinates for spatial analysis.
Creates bulk-level and pixel-level data objects.
Optionally denoises the data.
Saves all processed data as .rds files in the output directory.
Optionally overlays pixel coordinates on histology images and copies images to the app directory.
Optionally runs spatial autocorrelation and cross-correlation analysis.
Optionally runs pixel-level pathway enrichment analysis.
Optionally integrates multi-modal data (e.g., transcriptomics, proteomics etc).
Writes a static app.R file for a fully functional SMEW Shiny app which can be shared with collaborators or the community.

The generated app supports interactive analysis at multiple levels, including quality control, differential analysis, pathway enrichment, network inference, clustering, spatial visualisation, and more. See the package documentation and vignette for a full description of the app features and input requirements.

Examples

create_smew_app(
  intensity_csv = system.file("extdata", "bleo_sub_intensity.csv", package = "smew"),
  metadata_csv = system.file("extdata", "bleo_sub_meta.csv", package = "smew"),
  output_dir = tempdir(),
)
#> Checking input files exist and are readable...
#> Checking required columns in input files...
#> Checking pixel_id overlap between intensity matrix and metadata...
#> Ordering metadata to match intensity matrix...
#> Mapping m/z values to feature names...
#> No user-supplied annotations provided, using m/z values as feature names and mapping using internal annotation function
#> adducts or ion_mode is NULL; skipping annotation. anno will have NA names.
#> Keeping all 2516 peaks for downstream analysis.
#> Computing GCD values for grid spacing...
#> GCD values for grid spacing: 70
#> Transforming sample coordinates to gcd 1 for app...
#> Creating bulk sample-level metadata and intensity matrix...
#> Skipping denoising step...
#> Copied package www images to /var/folders/vb/gvs9l8xj3q10l987dbjg2bkw0000gq/T//RtmpPOujCs/smew_app/www
#> Copied package figures to /var/folders/vb/gvs9l8xj3q10l987dbjg2bkw0000gq/T//RtmpPOujCs/smew_app/figures
#> Saving processed data for app...
#> Static app.R written to /var/folders/vb/gvs9l8xj3q10l987dbjg2bkw0000gq/T//RtmpPOujCs/smew_app/app.R
#> [1] "/var/folders/vb/gvs9l8xj3q10l987dbjg2bkw0000gq/T//RtmpPOujCs/smew_app/app.R"
unlink(paste0(normalizePath(tempdir()), "/", dir(tempdir())), recursive = TRUE)