Setting up a project
The general steps to use multiome-wf in a new project are:
Installation: download the repository to your local system, into the place where you want to perform the data analysis. Run the following command in your terminal:
# Option 1: using SSH keys
$ git clone git@github.com:NICHD-BSPC/multiome-wf.git <project_name>
# Option 2: using HTTPS
$ git clone https://github.com/NICHD-BSPC/multiome-wf.git <project_name>
2. Configure: set up samples.tsv table for experiments and edit configuration file.
Optionally, set up aggregates.tsv for cellranger-aggr barcode/library mapping
and assays.tsv if you calculated custom count matrices.
Run: activate environment and run the Snakemake file either locally or on a cluster
Note
multiome-wf is tested and heavily used on Linux.
It is likely to work on macOS as long as all relevant conda packages are available for macOS – though this is not tested.
It will not work on Windows due to a general lack of support of Windows in bioinformatics tools.
The workflow is optimized based on the software versions in the
env.yaml.Snakemake versions compatible with the current verion of multiome-wf are
>=7.29and<8.0In the current version of multiome-wf, input datasets mapped to the mm10 (mouse) or hg38 (human) reference genomes are compatible.
1. Installation
Conda
The main prerequisite for multiome-wf is conda, with the bioconda channel set up and the mamba drop-in replacement for conda.
If you have not done so already, install conda or miniconda and then mamba into your base environment. See links above for details.
Clone Repository
Clone this repository into a project directory, using the following command:
$ git clone https://github.com/NICHD-BSPC/multiome-wf.git
If you have SSH keys set up for GitHub, run the following command:
$ git clone git@github.com:NICHD-BSPC/multiome-wf.git
Enter the project directory and create an environment based on an environment YAML file:
$ cd multiome-wf
$ mamba env create --prefix ./env --file env.yaml
Since we specify a YAML environment file, be sure to use mamba env create
instead of the more common mambe create command.
2. Configure
This step takes the most effort. The first time you set up a project it will take some time to understand the configuration system.
See Configuration details for more.
3. Run
Activate the main conda environment and go to the workflow you want to run. For example if you have configured a scRNA-seq run, then do:
$ conda activate ./env
and run the following:
# Dry run
$ snakemake -n
If all goes well, this should print a list of jobs to be run.
You can run locally, but this is NOT recommended. To run locally, choose the
number of CPUs you want to use with the -j argument as is standard for
Snakemake.
Warning
If you haven’t made any changes to the Snakefile, be aware that the default configuration needs a lot of RAM. Adjust the Snakefiles accordingly if you don’t have enough RAM available (search for “Xmx” to find the Java args that set memory).
# run locally (not recommended)
snakemake --use-conda -j 8
and then monitor the various jobs that will be submitted on your behalf. See Running on a cluster for more details on this.
Note
You can execute Snakemake jobs on a cluster using cluster profiles. Consult the configuration of your high-performance computing system. The current pipeline relies on the snakemake_profile supported by NIH Biowulf.
You can typically run simultaneous workflows when they are in different directories; see Overview of workflows for details.