Frequently Asked Questions

Some package versions configured in `env.yaml` are unavailable on conda

Specific package versions can become deprecated over time. The most straightforward approach is to delete the version information for unavailable packages in env.yaml. If alternative versions are incompatible with the rest of the packages, delete all version information except for the following packages:

# In env.yaml
snakemake<8
r-seurat>5
r-signac>1.10

Users may encounter additional version incompatibilities. Manually configure package versions if necessary.

My input is non-10X Genomics dataset

The multiome-wf has been designed for use with both 10X Genomics and non-10X Genomics datasets. Place your input matrices for barcodes, features, and read counts in the same folder, as instructed in the Non-10X Genomics. Once your input files are ready, provide this the path to this folder in the sample table (samples.tsv), as instructed in the Samples Table.

multiome-wf is incompatible with my HPC environment

Consult with your HPC staff to update your configuration based on the supported environment. Refer to the following pages for the default configuration:

How do I troubleshoot if I encounter errors?

The multiome-wf is designed to create Snakemake log files in the workflow/logs directory. Refer to the error messages in the log files for troubleshooting.

I have biological and/or technical replicates

We define biological and technical replicates as outlined in the Replicates section.

Once confirming that all the technical replicates were properly sequenced, read counts are summed across technical replicates by setting the input of cellranger count (RNA-seq), cellranger-arc count (ATAC-seq), or cellranger-arc count (Multiome) to technical replicates. Technical replicates prepared using Non-10X Genomics platforms follow equivalent steps. Alternatively, the multiome-wf performs count aggregation for samples labeled with the same value in the replicate column of the sample table (samples.tsv) or the aggregates table (aggregates.tsv).

Read counts from independent biological replicates are not aggregated to preserve biological variability. For users analyzing 10X Genomics datasets, cellranger aggr (RNA-seq), cellranger-arc aggr (ATAC-seq), or cellranger-arc aggr is optionally available for creating a concatenated feature-barcode matrix along with a cloupe file. The sample table (samples.tsv) accepts input for both per-biological replicate matrices and a concatenated matrix. If you provide per-biological replicate matrices, each row corresponds to a single biological replicate in the sample table. If you provide a concatenated matrix, users can provide per-biological replicate metadata in the aggregates table where each row corresponds to a single biological replicate. Learn more about building the sample and aggregates tables:

replicate in sample table: replicate
replicate in aggregates table: replicate

How do I explore my data?

The multiome-wf generates a modality-combined Seurat object, saved in workflow/results/combine/seurat_combined.rds. Users will need the Seurat package to open this object in R. Once imported, count matrices and dimensionality reductions for each modality are provided in the Seurat object, as shown below:

## An object of class Seurat
## 870542 features across 15600 samples within 9 assays
## Active assay: SCT (20354 features, 3000 variable features)
##  3 layers present: counts, data, scale.data
##  8 other assays present: Gene.Expression, Peaks, MACS, Gene.Activity, integrated_0_SCT, integrated_1_Peaks, integrated_2_MACS, integrated_3_Gene.Activity
##  19 dimensional reductions calculated: SCT_pca, SCT_umap, Peaks_lsi, Peaks_umap, MACS_lsi, MACS_umap, Gene.Activity_pca, Gene.Activity_umap, pca, integrated_0_pca, integrated_0_umap, integrated_1_lsi, integrated_1_umap, integrated_2_lsi, integrated_2_umap, integrated_3_pca, integrated_3_umap, wnn_0_umap, wnn_1_umap

The most straightforward way to explore this data is by utilizing the functions provided in the Seurat package. If you wish to perform downstream analyses outside of Seurat, you can extract count matrices and metadata to build the required single cell object, or simply convert to another single cell object.

Warning

The default argument setting may raise errors when calling Seurat functions due to modified names for assays and dimensionality reductions in the Seurat object. Be sure to carefully check the naming conventions in multiome-wf when assigning arguments.

Additional data is provided for marker genes, called peaks, and analysis reports at each step, depending on the analysis configurations. Refer to the Overview of output files for more details.

My Multiome/ATAC-seq input organism is neither human nor mouse

The current version of multiome-wf provides two options for annotating genomic loci: using EnsDb packages in R or a user’s GTF file. Users can configure this annotation option in the ANNOTATION field. One limitation of the EnsDb option is that only the mm10 (mouse) and hg38 (human) reference genomes are available with this package.

Input from other organisms, or reads from mouse/human mapped to different reference genomes, are supported through the GTF option.

Chromosome names differ from my input in Multiome/ATAC-seq

The multiome-wf relies on the EnsDb.Mmusculus.v79 (mm10) or EnsDb.Hsapiens.v86 (hg38) annotation packages in R. If the chromosome names in your input files don’t match the conventions used in these annotation packages, it is highly recommended that you rerun the mapping process using a reference genome with a compatible release version.