Spatial Transcriptomics Data Analysis: A Practical Introduction

Inquiry

Spatial transcriptomics data analysis helps you read gene expression in place—not just how much a gene is expressed, but where it happens inside a tissue. That spatial context makes it possible to map cell neighborhoods, discover tissue micro-environments, and connect molecular changes to histology. This article is a clear, lab-friendly guide to getting from raw spots or cells to interpretable biological insights.

If you've done bulk or single-cell RNA-seq before, you'll feel at home. We'll keep the focus on decisions that matter in practice—how to design the experiment, set sensible QC thresholds, choose the right normalization, cluster and annotate regions, and integrate single-cell references for deconvolution. We also highlight widely used toolchains (e.g., Space Ranger, Seurat, Squidpy, Giotto) and when each fits.

What you'll get from this guide

A step-by-step overview from preprocessing to reporting
Practical QC tips that balance sensitivity and noise
Clear routes for cell type mapping using scRNA-seq references
Reproducible workflows you can adopt or adapt in your lab

Who this is for

Researchers who want a concise starting point with enough depth to run real analyses—without wading through excessive theory. We avoid hype and aim for choices you can justify in a methods section.

Figure 1. Spatial transcriptomics evolution diagram (Du et al., 2023) Development of spatial transcriptomics. (Du, Jun, et al., Journal of translational medicine 2023)

Standard Workflow for Spatial Transcriptomics Data Analysis

A typical spatial transcriptomics analysis follows a structured, stepwise workflow that transforms raw sequencing and image data into meaningful biological insights:

1. Preprocessing & Image Alignment

Run platform-specific pipelines (e.g., Space Ranger) to generate expression matrices and align tissue images.

2. Data Integration into an Analysis Framework

Load processed data into tools like Seurat, Squidpy, Giotto, or SpatialExperiment to organize spatial features and metadata.

3. Quality Control & Filtering

Identify and remove low-quality spots based on library size, gene counts, and mitochondrial content.

4. Normalization & Variable Gene Selection

Apply normalization methods (e.g., SCTransform, log-normalization) and select highly variable genes for downstream analysis.

5. Dimensionality Reduction & Clustering

Reduce noise using PCA/UMAP and identify transcriptionally distinct spatial domains.

6. Spatially Variable Gene Detection

Identify genes showing significant spatial expression patterns (SVGs) across the tissue.

7. Cell Type Deconvolution

Integrate scRNA-seq references to estimate cell-type composition per spatial spot.

Figure 2. Workflow structure of transcriptomics analysis (Williams et al., 2022) Typical structure of spatial transcriptomics analysis. (Williams, C.G., et al., Genome Med 2022)

Tip: Treat each step as a decision point. Your choices at early stages (like filtering or normalization) influence everything downstream.

Essential Tools for Spatial Transcriptomics Data Analysis

A wide range of open-source tools now supports spatial transcriptomics data analysis—from basic QC to high-level modeling. In this section, we introduce the most commonly used tools across five key areas of the analysis pipeline. Whether you prefer R or Python, or need flexible pipelines for multi-sample studies, these tools form the foundation of a reproducible and insightful workflow.

1. Preprocessing & Image Registration

Before analysis begins, raw data must be processed using the platform's own pipeline to align reads, generate count matrices, and register spatial coordinates with histological images.

Space Ranger (10x Genomics): Processes Visium or Visium HD data; outputs expression matrices, tissue masks, aligned images, and QC metrics.
CosMx / GeoMx / Xenium onboard software: Similar pipelines for NanoString or 10x single-cell resolution platforms.
Manual QC overlays: Always review spot overlays on tissue images to verify correct registration.

Why it matters: Misaligned images or incorrect spot calls at this stage can undermine the entire analysis downstream.

2. Analysis Frameworks (R or Python)

Once data are processed, the next step is to load and explore them in a dedicated framework that supports spatial features, clustering, visualization, and statistical testing.

Seurat (R): Widely used for Visium data; offers modules for QC, normalization (including SCTransform), spatial clustering, and plotting.
Squidpy (Python): Built on top of Scanpy, it includes spatial graphs, neighborhood-based analysis, and integration with histology images.
Giotto (R): Offers both statistical analysis and a built-in interactive viewer; supports SVG detection and spatial interaction analysis.
SpatialExperiment (Bioconductor, R): A standardized container for spatial datasets, ideal for reproducible workflows and cross-method benchmarking.

Figure 3. SpatialExperiment data framework illustration (Righelli et al., 2022) Overview of the SpatialExperiment class structure. (Righelli, D., et al., Bioinformatics, 2022)

Choosing a platform: Seurat and Squidpy are the most actively maintained and supported. Giotto is great for interactive exploration, while SpatialExperiment is ideal for teams managing multiple spatial datasets.

3. Cell-Type Mapping & Deconvolution

Many researchers want to infer which cell types are present in each spatial spot using single-cell RNA-seq as a reference. These tools help estimate cell-type proportions or assign labels based on transcriptomic profiles.

cell2location (Python): Bayesian framework to map fine-grained cell types; supports multi-sample comparisons and uncertainty estimation.
Tangram (Python): Learns spatial mappings between scRNA-seq and spatial datasets; fast and scalable for large tissues.
RCTD (R): A popular deconvolution tool for Visium; works well with curated single-cell references.

Tip: Use tissue-matched or platform-matched single-cell references whenever possible, and validate predictions using known spatial markers or histology images.

4. Spatial Clustering & Domain Detection

Spatial transcriptomics data can reveal distinct tissue regions or microenvironments through clustering methods that incorporate spatial location and expression.

BayesSpace (R): A Bayesian model that enhances resolution and refines spatial clusters beyond the spot level.
SpaGCN (Python): Combines gene expression, spatial proximity, and histological texture using graph convolutional networks.
stLearn (Python): Integrates spatial distance, morphology, and transcriptomics to reconstruct spatial trajectories and domains.

Best use: These methods are most effective when identifying subtle spatial patterns that standard clustering might overlook.

5. Reproducible Pipelines

To analyze multiple samples or standardize team workflows, automated pipelines save time and improve reproducibility.

Spacemake (Python + Snakemake): Modular and scalable pipeline that handles preprocessing, clustering, and spatial modeling across multiple datasets and platforms.
Panpipes (Python): Supports multimodal single-cell and spatial analysis including QC, integration, and cell-type mapping, built on Scanpy.

Why pipelines matter: They ensure that all steps—from filtering to reporting—are recorded, reproducible, and easy to scale up across experiments.

Summary:

There's no single best tool for all projects. Start with familiar platforms (Seurat or Squidpy), then extend to spatially-aware clustering or deconvolution as your questions evolve. For multi-sample studies or collaborative projects, invest early in a reproducible pipeline.

Common Challenges and How to Handle Them

Which tools should I use to analyze spatial transcriptomics data?

Most users start with Seurat or Squidpy for core analysis. For deconvolution, cell2location and Tangram are leading options. For spatial clustering, BayesSpace and SpaGCN are well-supported and widely used.

Are there pipelines for analyzing spatial data across multiple samples?

Yes. Spacemake and Panpipes are designed for batch processing and reproducibility, making them ideal for labs managing large projects or complex study designs.

Spatial Transcriptomics Services

10X Genomics Visium

Detailed Analysis Pipeline for Spatial Transcriptomics Data

Once you've generated spatial transcriptomics data and loaded it into your analysis environment, the real interpretation begins. This section walks through each major step of the analysis workflow—not just what to do, but how to think through each stage from a researcher's perspective.

1. Preprocessing & Data Loading

Preprocessing starts with converting raw sequencing outputs into usable expression data and registering them against histological images. This step is typically handled by the platform's own software (e.g., Space Ranger), but it's critical that you check the results carefully before moving on.

What to look for:

Are the spots aligned properly with the tissue image?
Are background spots filtered out correctly?
Are spatial coordinates scaled and centered correctly in downstream software?

Once preprocessed, the data are imported into an object structure that stores:

Gene expression counts
Spot/cell coordinates
High-resolution images
QC metadata

These structured objects allow for seamless progression through the rest of the analysis pipeline in R or Python environments.

2. Quality Control (QC) & Filtering

QC is not just about removing bad data—it's about ensuring you're working with biologically meaningful signals.

What should you evaluate?

Low-complexity spots: Too few reads or genes detected
High mitochondrial content: Often indicates dying or degraded cells
Off-tissue spots: Can introduce background noise

Use violin plots, scatterplots, and spatial overlays to visualize QC metrics. Be flexible: a FFPE sample may require looser thresholds than a fresh frozen one.

FFPE Spatial Transcriptomics Service

Spatial Transcriptome Sequencing of Frozen Samples

✅ Practical tip: Save your filtering logic as a script or notebook cell—it's often revisited later when you tweak thresholds based on downstream results.

3. Normalization & Feature Selection

Normalization adjusts for differences in sequencing depth and technical variation, making expression values comparable across spots.

Common approaches:

SCTransform (Seurat) – variance-stabilizing, robust across samples
Log-normalization (Scanpy/Squidpy) – straightforward and interpretable

Once normalized, select highly variable genes (HVGs) to reduce noise and focus on informative features. These genes will be used for PCA, clustering, and visualization.

Don't skip HVG filtering—without it, clustering can be driven by noise or housekeeping genes.

4. Dimensionality Reduction & Clustering

To reveal structure in your dataset, apply dimensionality reduction and clustering.

PCA is the usual first step to capture global variation.
UMAP or t-SNE then projects these into 2D for visualization.
Clustering algorithms like Leiden or Louvain group spots with similar profiles.

Key principle: Always verify clusters on the actual tissue image. Expression-based clusters that don't correspond to anatomical structures may still be valid—but require closer scrutiny.

5. Spatially Variable Gene (SVG) Detection

SVG detection identifies genes whose expression varies across spatial space, not just by expression level.

When to perform SVG analysis:

After QC and normalization
Once clusters are established or a spatial structure is suspected
To identify regional marker genes

Methods to consider:

Statistical models (e.g., SPARK-X)
Graph-based smoothing (e.g., Squidpy)
Gaussian process modeling (e.g., nnSVG)

SVGs are often used for downstream:

Marker gene discovery
Region annotation
Spatial trajectory modeling

Figure 4. SPARK-X method workflow and results chart (Zhu et al., 2021) Method schematic of SPARK-X and simulation results. (Zhu, J., et al., Genome Biol, 2021)

6. Cell Type Mapping via scRNA-seq Integration

Many spatial datasets (e.g., from 10x Visium) contain multi-cell spots. To understand cellular composition, you can map reference scRNA-seq data back to each spot.

Steps involved:

Select or generate an annotated single-cell reference dataset.
Normalize and align gene names between spatial and scRNA data.
Use tools like cell2location, Tangram, or RCTD to infer cell-type proportions.

Validation is key:

Visualize estimated abundances spatially
Compare with known marker genes or histology
Look for consistency across biological replicates

Common pitfall: Using a mismatched single-cell reference (wrong tissue, condition, or platform) can lead to misleading or uninterpretable mappings.

7. Spatial Domain Annotation & Biological Interpretation

At this point, you have clusters, marker genes, and inferred cell types. The final step is biological interpretation.

Questions to ask:

Do clusters align with known tissue regions (e.g., cortex, tumor margin, immune infiltrate)?
Do SVGs or cell-type maps suggest new hypotheses about tissue architecture?
How do spatial patterns relate to phenotype, condition, or treatment?

Integrate your spatial data with:

Histology images
Published atlases
In situ hybridization (ISH) or IHC validation (if available)

8. Reproducible Workflow Design

Even if you're analyzing a single sample, plan for reproducibility.

Recommended practices:

Save all filtering and clustering parameters
Use reproducible scripts or notebooks
Store intermediate files in structured folders
Consider using pipelines (e.g., Panpipes, Spacemake, or MOSAIK) for multi-sample projects

Bonus tip: Add a README or workflow diagram to your project folder for future you—or future collaborators.

Practical Tips & Real-World Experience

While the formal analysis pipeline lays out a clear sequence of steps, actual research often brings surprises—batch effects, unexpected tissue variability, or analysis dead ends. This section collects practical advice from real-world spatial transcriptomics studies, grounded in both lab workflows and bioinformatics analysis.

Figure 5. Research applications of spatial transcriptomics (Du et al., 2023) Application scenarios of spatial transcriptomics. (Du, Jun, et al., Journal of translational medicine 2023)

1. Experimental Design Impacts Everything Downstream

Good data starts at the bench.

Sample preservation matters:

Fresh frozen samples generally yield higher-quality data, but FFPE is more accessible for clinical specimens. Plan for lower complexity in FFPE datasets.

Tissue thickness and coverage:

Thick sections may cause partial spot dropout due to incomplete imaging or RNA diffusion.
Trimmed or uneven samples often result in low-quality border regions—flag these early during image review.

Spot resolution vs. biological question:

Visium is sufficient for most tissue-level questions.
Platforms like Xenium or CosMx offer single-cell or subcellular resolution—but require much more data processing and storage.

Design tip: Align your platform choice and tissue preparation with your biological question and downstream analysis plan.

2. Be Realistic About Quality Control

QC thresholds aren't one-size-fits-all.

Immune tissue may naturally show fewer genes per spot.
Necrotic tumor cores or fibrotic zones may have low signal but still be biologically important.
Brain tissue typically shows strong contrast between gray and white matter—expect distinct QC profiles.

Strategy: Use spatial plots to compare raw metrics visually, not just numerically. If necessary, segment QC by tissue region before applying global filters.

3. Plan for Batch Effects and Sample Integration

Spatial transcriptomics datasets often come from multiple sections, patients, or timepoints. These can introduce technical variation that obscures true biological signals.

Use batch-aware normalization tools (e.g., SCTransform with regression, Harmony, Scanorama).
Track sample IDs and platform metadata from the beginning—don't retroactively reconstruct this.
Visualize batch effects in PCA or UMAP before clustering or SVG analysis.

Pro tip: Run a small pilot analysis to confirm batch behavior before committing to full integration.

4. Interpret Results in Spatial and Biological Context

Gene expression patterns are only part of the story. Interpretation improves when layered with histology, tissue landmarks, or spatial features.

Compare cluster outlines with H&E features—don't trust UMAP alone.
SVGs near the tissue edge may reflect technical artifacts (e.g., tissue detachment).
Enrichment of immune or stromal markers at margins may signal real biological boundaries—or sample sectioning effects.

Biology-first mindset: Always ask, "Does this pattern make sense biologically and spatially?"

5. Invest in Reproducibility Early

When a project moves from 1–2 samples to 10+, manual workflows break down fast. Early investment in organization pays off.

Use consistent file naming (e.g., patientA_slide1.h5ad)
Save intermediate objects at each stage (raw, filtered, normalized, clustered)
Log your parameters (e.g., QC thresholds, clustering resolution) in plain text or markdown files
Use version-controlled notebooks (e.g., R Markdown, Jupyter with Git)

If working with multiple people or samples, pipelines like Spacemake, MOSAIK, or Panpipes can prevent errors and accelerate consistency.

Common Pitfalls to Avoid

Pitfall	Why it happens	How to avoid it
Clusters don't match histology	Overreliance on UMAP	Cross-check clusters on tissue images
Poor deconvolution	Mismatched scRNA-seq reference	Use tissue- and condition-matched references
Inconsistent QC	Thresholds applied globally	Visualize QC per tissue or sample
Lost file traceability	Manual renaming, no folder structure	Standardize naming and save scripts

Summary & Suggested Next Steps

A spatial transcriptomics project moves from raw reads → clean spots → interpretable maps. Keep it structured, reproducible, and biology-first.

Workflow at a Glance

Stage	Purpose
Preprocessing & Loading	Turn raw reads/images into counts + aligned coordinates
Quality Control	Remove low-quality or off-tissue spots
Normalization & HVGs	Stabilize signals; keep informative genes
DR & Clustering	Reveal transcriptomic domains
SVG Detection	Find genes with spatial patterns
Cell-Type Mapping	Estimate cell proportions using scRNA-seq references
Interpretation & Reporting	Anchor signals to histology and biology
Reproducible Design	Scale across samples and collaborators

References

Ståhl, Patrik L., et al. "Visualization and analysis of gene expression in tissue sections by spatial transcriptomics." Science 353.6294 (2016): 78-82.
Rodriques, S.G., Stickels, R.R., Goeva, A. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Stickels, R.R., Murray, E., Kumar, P. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nature Biotechnology 39, 313–319 (2021).
Righelli, D., Crowell, H.L., Weber, L.M. et al. SpatialExperiment: infrastructure for spatially-resolved transcriptomics data in R using Bioconductor. Bioinformatics 38(11), 3128–3131 (2022).
Cable, D.M., Murray, E., Zou, L.S. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 40, 517–526 (2022).
Hu, J., Li, X., Coleman, K. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods 18, 1342–1351 (2021).
Zhu, J., Sun, S. & Zhou, X. SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol 22, 184 (2021).
Du, Jun, et al. "Advances in spatial transcriptomics and related data analysis strategies." Journal of translational medicine 21.1 (2023): 330.
Williams, C.G., Lee, H.J., Asatsuma, T. et al. An introduction to spatial transcriptomics for biomedical research. Genome Med 14, 68 (2022).

For research use only, not intended for any clinical use.