How to get started

Step 1: Installing multiome-wf

The multiome-wf requires the Snakemake pipeline to be cloned to your local system, along with necessary packages installed in the conda environment. Follow the instructions in the Setting up a project page.

If you are new to multiome-wf, please review the file system described in the Core pipeline structure before starting configuration.

Step 2: Configuring sample tables

To use input files created per sample, complete the samples.tsv in your configuration folder. If you are using input files where all samples have been aggregated, fill out the aggregates.tsv in addition to samples.tsv. Refer to General configuration for the hierarchy of configuration directories and files.

Note

Note that multiome-wf organizes sample information by modality. Any sample tables not related to your analysis modality will be disregarded. For example, the sample table for Multiome should be placed in the multiome-config directory.

Step 3: Configuring Snakemake pipeline

The multiome-wf is designed to use the config.yaml file in your configuration folder. Modify this file according to the instructions in the Config YAML.

Note

multiome-wf assumes that your config.yaml and sample tables are located in the same configuration directory. Any config.yaml files not related to your analysis modality will be disregarded.

Step 4: Updating Snakefile

Once you have finished configuring your sample tables and config.yaml, edit your Snakefile accordingly. First, point the configfile to your config.yaml file, as shown below:

configfile: "config/multiome-config/config.yaml"

Depending on the computation resources required for each rule, update the resources directive, as shown below:

resources:
    mem_mb=1024 * 100,
    disk_mb=1024 * 50,
    runtime=60 * 12

Note

If it is difficult to accurately estimate the required resources, you can update them after encoutering out-of-resource errors during the run.

Step 5 (Optional): Configuring high performance computing

Optionally, users can run the multiome-wf on a High Performance Computing (HPC) cluster using Snakemake’s cluster execution feature. Refer to the Running on a cluster for more details.

Once you have finished configuring your sample tables, config.yaml, and updated your Snakefile accordingly, set the --configfile parameter to the path of your config.yaml file in the WRAPPER_SLURM file, as shown below:

# Run snakemake
(
  time snakemake \
  -p \
  --directory $PWD \
  -k \
  --restart-times 3 \
  --rerun-incomplete \
  --rerun-triggers mtime \
  --jobname "s.{rulename}.{jobid}.sh" \
  -j 999 \
  --use-conda \
  --configfile config/multiome-config/config.yaml \    # UPDATE THIS PARAMETER!
  $PROFILE_CMD \
  --latency-wait=300 \
  --max-jobs-per-second 1 \
  --max-status-checks-per-second 0.01 \
  "$@"
  ) > "Snakefile.log" 2>&1

Note

Note that the default configuration of multiome-wf is optimized for NIH’s Biowulf and may not be suitable for other HPC systems. Users may need additional configuration, including modifications to the WRAPPER_SLURM, depending on their computing environment.

Step 6: Run

Once everything is ready, users can run the workflow with or without cluster computing. Assuming you are in the same directory where your Snakefile is located, use the following command in your terminal to run without submitting a cluster job:

# Activate conda environment
conda activate ../env
# Run Snakemake
snakemake --core 8

If you wish to run multiome-wf on a cluster and you have configured the pipeline for cluster computing, activate your conda environment and use the following command in the terminal:

# Run Snakemake by submitting jobs
sbatch WRAPPER_SLURM

Note

The above command is optimized for job submission on NIH’s Biowulf. Consult your HPC documentation or administrator for guidance on running Snakemake on a cluster, if necessary.

Step 7: Results

The multiome-wf creates a results folder in the workflow directory upon completion of a run. All output files are saved in the results folder. Refer to the Overview of output files for more information.