Setting up a project

The general steps to use multiome-wf in a new project are:

Installation: download the repository to your local system, into the place where you want to perform the data analysis. Run the following command in your terminal:

# Option 1: using SSH keys
$ git clone git@github.com:NICHD-BSPC/multiome-wf.git <project_name>

# Option 2: using HTTPS
$ git clone https://github.com/NICHD-BSPC/multiome-wf.git <project_name>

2. Configure: set up samples.tsv table for experiments and edit configuration file. Optionally, set up aggregates.tsv for cellranger-aggr barcode/library mapping and assays.tsv if you calculated custom count matrices.

Run: activate environment and run the Snakemake file either locally or on a cluster

Note

multiome-wf is tested and heavily used on Linux.
It is likely to work on macOS as long as all relevant conda packages are available for macOS – though this is not tested.
It will not work on Windows due to a general lack of support of Windows in bioinformatics tools.
The workflow is optimized based on the software versions in the env.yaml.
Snakemake versions compatible with the current verion of multiome-wf are >=7.29 and <8.0
In the current version of multiome-wf, input datasets mapped to the mm10 (mouse) or hg38 (human) reference genomes are compatible.

1. Installation

Conda

The main prerequisite for multiome-wf is conda, with the bioconda channel set up and the mamba drop-in replacement for conda.

If you have not done so already, install conda or miniconda and then mamba into your base environment. See links above for details.

Clone Repository

Clone this repository into a project directory, using the following command:

$ git clone https://github.com/NICHD-BSPC/multiome-wf.git

If you have SSH keys set up for GitHub, run the following command:

$ git clone git@github.com:NICHD-BSPC/multiome-wf.git

Enter the project directory and create an environment based on an environment YAML file:

$ cd multiome-wf
$ mamba env create --prefix ./env --file env.yaml

Since we specify a YAML environment file, be sure to use mamba env create instead of the more common mambe create command.

2. Configure

This step takes the most effort. The first time you set up a project it will take some time to understand the configuration system.

See Configuration details for more.

3. Run

Activate the main conda environment and go to the workflow you want to run. For example if you have configured a scRNA-seq run, then do:

$ conda activate ./env

and run the following:

# Dry run
$ snakemake -n

If all goes well, this should print a list of jobs to be run.

You can run locally, but this is NOT recommended. To run locally, choose the number of CPUs you want to use with the -j argument as is standard for Snakemake.

Warning

If you haven’t made any changes to the Snakefile, be aware that the default configuration needs a lot of RAM. Adjust the Snakefiles accordingly if you don’t have enough RAM available (search for “Xmx” to find the Java args that set memory).

# run locally (not recommended)
snakemake --use-conda -j 8

and then monitor the various jobs that will be submitted on your behalf. See Running on a cluster for more details on this.

Note

You can execute Snakemake jobs on a cluster using cluster profiles. Consult the configuration of your high-performance computing system. The current pipeline relies on the snakemake_profile supported by NIH Biowulf.

You can typically run simultaneous workflows when they are in different directories; see Overview of workflows for details.