Frequently Asked Questions
My input is non-10X Genomics dataset
The multiome-wf has been designed for use with both 10X Genomics and non-10X Genomics datasets.
Place your input matrices for barcodes, features, and read counts in the same folder, as instructed
in the Non-10X Genomics. Once your input files are ready, provide this the path to this folder
in the sample table (samples.tsv), as instructed in the Samples Table.
multiome-wf is incompatible with my HPC environment
Consult with your HPC staff to update your configuration based on the supported environment. Refer to the following pages for the default configuration:
How do I troubleshoot if I encounter errors?
The multiome-wf is designed to create Snakemake log files in the workflow/logs directory. Refer to
the error messages in the log files for troubleshooting.
I have biological and/or technical replicates
We define biological and technical replicates as outlined in the Replicates section.
Once confirming that all the technical replicates were properly sequenced, read counts are
summed across technical replicates by setting the input of cellranger count (RNA-seq),
cellranger-arc count (ATAC-seq), or cellranger-arc count (Multiome) to technical replicates.
Technical replicates prepared using Non-10X Genomics platforms follow equivalent steps. Alternatively,
the multiome-wf performs count aggregation for samples labeled with the same value
in the replicate column of the sample table (samples.tsv) or the aggregates table
(aggregates.tsv).
Read counts from independent biological replicates are not aggregated to preserve biological
variability. For users analyzing 10X Genomics datasets, cellranger aggr (RNA-seq),
cellranger-arc aggr (ATAC-seq), or cellranger-arc aggr is optionally available for creating
a concatenated feature-barcode matrix along with a cloupe file. The sample table
(samples.tsv) accepts input for both per-biological replicate matrices and a concatenated matrix.
If you provide per-biological replicate matrices, each row corresponds to a single biological replicate
in the sample table. If you provide a concatenated matrix, users can provide per-biological replicate
metadata in the aggregates table where each row corresponds to a single biological replicate.
Learn more about building the sample and aggregates tables:
How do I explore my data?
The multiome-wf generates a modality-combined Seurat object, saved in
workflow/results/combine/seurat_combined.rds. Users will need the Seurat package to open this object in R. Once imported, count matrices
and dimensionality reductions for each modality are provided in the Seurat object, as shown below:
## An object of class Seurat
## 870542 features across 15600 samples within 9 assays
## Active assay: SCT (20354 features, 3000 variable features)
## 3 layers present: counts, data, scale.data
## 8 other assays present: Gene.Expression, Peaks, MACS, Gene.Activity, integrated_0_SCT, integrated_1_Peaks, integrated_2_MACS, integrated_3_Gene.Activity
## 19 dimensional reductions calculated: SCT_pca, SCT_umap, Peaks_lsi, Peaks_umap, MACS_lsi, MACS_umap, Gene.Activity_pca, Gene.Activity_umap, pca, integrated_0_pca, integrated_0_umap, integrated_1_lsi, integrated_1_umap, integrated_2_lsi, integrated_2_umap, integrated_3_pca, integrated_3_umap, wnn_0_umap, wnn_1_umap
The most straightforward way to explore this data is by utilizing the functions provided in the Seurat package. If you wish to perform downstream analyses outside of Seurat, you can extract count matrices and metadata to build the required single cell object, or simply convert to another single cell object.
Warning
The default argument setting may raise errors when calling Seurat functions due to modified names for assays and dimensionality reductions in the Seurat object. Be sure to carefully check the naming conventions in multiome-wf when assigning arguments.
Additional data is provided for marker genes, called peaks, and analysis reports at each step, depending on the analysis configurations. Refer to the Overview of output files for more details.
My Multiome/ATAC-seq input organism is neither human nor mouse
The current version of multiome-wf provides two options for annotating genomic loci:
using EnsDb packages in R or a user’s GTF file. Users can configure this annotation option
in the ANNOTATION field. One limitation of the EnsDb option is that only the mm10
(mouse) and hg38 (human) reference genomes are available with this package.
Input from other organisms, or reads from mouse/human mapped to different reference genomes, are
supported through the GTF option.
Chromosome names differ from my input in Multiome/ATAC-seq
The multiome-wf relies on the EnsDb.Mmusculus.v79 (mm10) or EnsDb.Hsapiens.v86 (hg38)
annotation packages in R. If the chromosome names in your input files don’t match the conventions
used in these annotation packages, it is highly recommended that you rerun the mapping process
using a reference genome with a compatible release version.