Getting started with SOPA#

SOPA#

SOPA is a wonderful pipeline, similar to MCMICRO but has some fundamental differences.

Similarly, for a overview of SOPA read the paper, for a more technical overview visit the website.

General:#

SOPA runs on Snakemake, nextflow version is in development.
SOPA uses the Spatialdata object natively, some experience with SpatialData is preferred.

Advantages:#

SOPA can run on Windows.
SOPA is easier to customize than MCMICRO. For adding a function quickly to your pipeline use this (Python required).
SOPA can be used in four flavours: jupyter notebooks with python API, snakemake, nextflow, and CLI.
SOPA is compatible with various modalities: Xenium, Visium HD, MERSCOPE, CosMx, PhenoCycler, MACSima, Molecular Cartography.

Disadvantages:#

More modality compatibility leads to complexity of setup.
SOPA cannot perform Illumination correction, Stitching and Registration, or Background Subtraction.(Jose is considering helping out SOPA devs with these).
Output is spatialdata object, not super friendly to get data out of.
Segmentation output is .geojson not .tif (you can convert easily with spatialdata function).

Setup SOPA locally in your computer (for testing subsets)#

Familiarize yourself with the Snakemake framework.
Follow SOPA Getting Started.
Follow SOPA’s snakemake tutorial.

Step 1: Ensure you have conda/mamba installed#

Download micromamba from Micromamba Releases. You might have to click show all 27 assets to see the version you need.
Check that you can create environments and download packages

Step 2: Create sopa environment#

conda create --name sopa python=3.12

conda activate sopa

Install sopa (with cellpose extension) and snakemake

pip install 'sopa[cellpose]' snakemake

for some reason cellpose 4 breaks sopa, we have to install cellpose 3

pip install 'cellpose <4'

Step 3: Download sopa defaults#

SOPA have many processes, and many processes are modality specific. You can concatenate them as you want (not simple, not super complex either). To simplify things for now, We will use SOPA-provided defaults for dealing with ome.tif

In the terminal, go to directory you know and this will download the entire github repository.

git clone https://github.com/gustaveroussy/sopa.git

Step 4: Edit default parameters#

Open the following file inside the downloaded sopa directory: sopa/workflow/configs/misc/ome_tif.yaml

This is the parameter file that sets the parameters for all the processes.
For a deeper look into what each of these does check parameter_guide

We will change the cellpose parameters to the following to make it work with our demo image.

# these are the settings you should have (ensure you save the file after you are done).
read:
  technology: ome_tif

patchify:
  patch_width_pixel: 400
  patch_overlap_pixel: 40

segmentation:
  cellpose:
    model_type: "nuclei"
    diameter: 25
    channels: [ "DAPI_bg" ]
    flow_threshold: 2
    cellprob_threshold: -6
    min_area: 25
    gaussian_sigma: 1

aggregate:
  aggregate_channels: true
  min_intensity_ratio: 0.1
  expand_radius_ratio: 0.1

explorer:
  ram_threshold_gb: 4
  pixel_size: 1

Step 5: Run sopa with defaults#

There are three paths you need to pass to the snakemake command:

path to your image
path to the edited config file
path to the workflow profile (provided by sopa download)

For the demo image copy:
RD_Coscia/Jose/__TestDatasets/TD_01_verysmall_mIF.ome.tif
and place it in here:
sopa/data/TD_01_verysmall_mIF.ome.tif

SOPA command template

snakemake \
    --config data_path=$PATH_TO_IMAGE \
    --configfile=$PATH_TO_YAML \
    --workflow-profile ./workflow/profile/local \
    --cores 4

I suggest you run the command from the sopa directory, and it will look something like this:

snakemake \
    --config data_path=./data/TD_01_verysmall_mIF.ome.tif \
    --configfile=./config/misc/ome_tif.yaml \
    --workflow-profile ./workflow/profile/local \
    --cores 6

Note, for Windows you must replace:

< \ > with a caret < ^ > for the traditional Command Prompt (cmd.exe) and a backtick <`> for PowerShell.
you can also remove them and create a single line command in an editor, and then copy paste.

Step 6: Check post-run#

Before:

.
├── command.txt
├── config
│   └── ome_tif.yaml
└── data
    └── TD_01_verysmall_mIF.ome.tif

After:

.
├── command.txt
├── config
│   └── ome_tif.yaml
└── data
    ├── TD_01_verysmall_mIF.ome.explorer
    ├── TD_01_verysmall_mIF.ome.tif
    └── TD_01_verysmall_mIF.ome.zarr

The .explorer file can be opened with the Xenium explorer software (free). The .zarr file can be opened with spatialdata.

Run SOPA on HPC#

Create a conda environment with snakemake, snakemake-slurm, and snakemake-executor-plugin-slurm.
Use profile that will use the executor plugin check Snakemake with Slurm

Example bash script for single snakemake run

#!/bin/bash

#SBATCH --partition=medium
#SBATCH --job-name=smk_main_${job_prefix}
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=250M
#SBATCH --output=slurm_logs/%x-%j.log

FILE_PATH=$1  # The file path is passed as an argument to the script
filename=$(basename "$FILE_PATH")
job_prefix="${filename:0:4}"

export SBATCH_DEFAULTS="--output=slurm_logs/%x-%j.log"

date

source /data/cephfs-1/home/users/jnimoca_m/work/miniconda/etc/profile.d/conda.sh

echo "Activate sopa env"
conda activate sopa

echo "Running snakemake with file $FILE_PATH"
srun snakemake \
--config data_path="$FILE_PATH" \
--use-conda -j 100 --profile=cubi-v1

date

Example array script to run many images

files=(
    "992_backsub.ome.tif"
    "993_backsub.ome.tif"
    "994_backsub.ome.tif"
    "997_backsub.ome.tif"
)

for file in "${files[@]}"; do
    echo "Submitting job for file $file"
    sbatch snakemake_run.sh "data/input/${file}"
done

Example custom snakemake file; pipeline with every step and parameters

configfile: "ome_tif.yaml"

# version 2.1.0
# date: 09.09.2024
# comments:
# v2.0.0: increased time of quantification rule to 6 hours and changed partition to medium
# v2.1.0: added pyrimidize

from utils import WorkflowPaths, Args

paths = WorkflowPaths(config)
args = Args(paths, config)

localrules: all

rule all:
    input:
        f"data/quantification/{paths.sample_name}_quantification.csv",
        f"data/processed_images/{paths.sample_name}_8bit_pyrimidized.ome.tif"
    shell:
        """
        echo 🎉 Successfully run sopa
        echo → SpatialData output directory: {paths.sdata_path}
        """

rule downscale_image:
    input:
        paths.data_path,
    output:
        path_downscaled_image = f"data/processed_images/{paths.sample_name}_8bit.tif"
    conda:
        "sopa"
    resources:
        mem_mb=256_000,
        partition="short",
        runtime="3h",
    threads: 4
    shell:
        """
        mkdir -p ./data/processed_images

        python scripts/downscale_image_to8bit.py \
        --input {paths.data_path} \
        --output {output.path_downscaled_image} \
        """

rule pyrimidize:
    input:
        processed_imaged = f"data/processed_images/{paths.sample_name}_8bit.tif"
    output:
        pyrimidized_image = f"data/processed_images/{paths.sample_name}_8bit_pyrimidized.ome.tif"
    conda:
        "sopa"
    resources:
        mem_mb=128_000,
        partition="medium",
        runtime="6h",
    threads: 4
    params:
        tilesize = config["pyramid"]["tile-size"],
    shell:
        """
        python scripts/pyramidize.py \
        --input {input.processed_imaged} \
        --output {output.pyrimidized_image} \
        --tile-size {params.tilesize}
        """

rule to_spatialdata:
    input:
        paths.data_path if config["read"]["technology"] != "uniform" else [],
    output:
        paths.sdata_zgroup if paths.data_path else [],
    conda:
        "sopa"
    resources:
        mem_mb=128_000,
        partition="short",
        runtime="3h",
    threads: 2
    params:
        args_reader = str(args['read'])
    shell:
        """
        sopa read {paths.data_path} --sdata-path {paths.sdata_path} {params.args_reader}
        """

checkpoint patchify_cellpose:
    input:
        paths.sdata_zgroup,
    output:
        patches_file = paths.smk_patches_file_image,
        patches = touch(paths.smk_patches),
    params:
        args_patchify = str(args["patchify"].where(contains="pixel")),
    conda:
        "sopa"
    resources:
        mem_mb=32_000,
        partition="short",
        runtime="1h",
    threads: 4
    shell:
        """
        sopa patchify image {paths.sdata_path} {params.args_patchify}
        """

rule patch_segmentation_cellpose:
    input:
        paths.smk_patches_file_image,
        paths.smk_patches,
    output:
        paths.smk_cellpose_temp_dir / "{index}.parquet",
    conda:
        "sopa"
    resources:
        mem_mb=32_000,
        partition="short",
        runtime="1h",
    threads: 4
    params:
        args_cellpose = str(args["segmentation"]["cellpose"]),
    shell:
        """
        sopa segmentation cellpose {paths.sdata_path} --patch-dir {paths.smk_cellpose_temp_dir} --patch-index {wildcards.index} {params.args_cellpose}
        """


def get_input_resolve(name, dirs=False):
    def _(wilcards):
        with getattr(checkpoints, f"patchify_{name}").get(**wilcards).output.patches_file.open() as f:
            return paths.cells_paths(f.read(), name, dirs=dirs)
    return _

rule resolve_cellpose:
    input:
        get_input_resolve("cellpose"),
    output:
        touch(paths.smk_cellpose_boundaries),
    conda:
        "sopa"
    resources:
        mem_mb=32_000,
        partition="short",
        runtime="1h",
    threads: 4
    shell:
        """
        sopa resolve cellpose {paths.sdata_path} --patch-dir {paths.smk_cellpose_temp_dir}
        """

rule rasterize:
    input:
        paths.sdata_zgroup,
        paths.smk_cellpose_boundaries
    output:
        mask_tif=f"data/masks/{paths.sample_name}_mask.tif"
    conda:
        "sopa"
    resources:
        mem_mb=256_000,
        partition="short",
        runtime="1h",
    threads: 4
    shell:
        """
        mkdir -p ./data/masks

        python scripts/rasterize.py \
        --input {paths.sdata_path} \
        --output {output.mask_tif}
        """

rule expand_markers:
    input:
        mask = f"data/masks/{paths.sample_name}_mask.tif"
    output:
        exp_mask = f"data/masks/{paths.sample_name}_mask_expanded.tif"
    resources:
        mem_mb=256_000,
        partition="short",
        runtime="1h",
    threads: 4
    params:
        pixels = config["expand"]["pixels"]
    shell:
        """
        python scripts/expand_mask_singlemask.py \
        --input {input.mask} \
        --output {output.exp_mask} \
        --pixels {params.pixels} \
        """

rule quantify:
    input:
        image = paths.data_path,
        mask = f"data/masks/{paths.sample_name}_mask_expanded.tif",
        markers = "data/input/markers.csv"
    output:
        quantification = f"data/quantification/{paths.sample_name}_quantification.csv"
    conda:
        "sopa"
    resources:
        mem_mb=256_000,
        partition="medium",
        runtime="6h",
    threads: 4
    params:
        math = config["quantify"]["math"],
        quantile = config["quantify"]["quantile"],
    shell:
        """
        mkdir -p ./data/quantification

        python scripts/quant.py \
        --image {input.image} \
        --label {input.mask} \
        --markers {input.markers} \
        --output {output.quantification} \
        --math {params.math} \
        --quantile {params.quantile}
        """

Thank you for your interest!

Getting started with SOPA

Contents