Config YAML

This page details the various configuration options and describes how to configure a new workflow. Refer to the Configuration details section for general information about configuring multiome-wf.

While it is possible to use Snakemake mechanisms such as --config to override a particular config value and --configfile to update the config with a different file, it is easiest to edit the existing config.yaml in place. This has the additional benefit of reproducibility because all of the config information is stored in one place.

The config file uses YAML format, which can be conceptualized as a set of nested key:value pairs. When running the workflow, the YAML document is parsed into a python dictionary.

By specifying values in various setions of the config.yaml, the workflow automatically decides to run analysis variants suitable for scRNA-Seq, scATAC-Seq, or multi-modal experiments. With this in mind, there are 2 important points to keep in mind when creating a config.yaml.

1. Activating Rules

The following rules are optional:

These rules have discrete sections in the config.yaml where users configure the execution of each rule. Refer to the following instruction to activate or inactivate each rule:

# Activate MACS2 peak calling
macs2:
  run: "Y"

# Activate dataset integration
integrate:
  activate: true

# Activate chooseR
cluster:
  resolution: null

# Activate marker gene computation
diff_analysis:
  activate: true

# Activate Weighted Nearest Neighber
weighted_nn:
  activate: true

Note that the chooser_paral and chooser_aggr rules only run when no pre-defined resolution is provided by the user in the cluster section.

2. Analysis Groups

Because of the myriad variants for single cell analysis and preprocessing, it is not possible to hard-code all the configuration options in the config.yaml file. Instead, we include analysis “group names” in many sections. These rules will have a field named group. Each group must contain a nested dictionary for each analysis variant.

To configure these sections, the user must specify the top-level dictionary key value. All other keys are hard-coded as options.

Using the normalize section as an example, we see a single analysis group below. The group value, unintegrated_0, is itself a dictionary key for this analysis variant (a modality for RNA in Multiome). This group’s dictionary contains additional fields which together define the groups’ analysis options: assay_name, and norm_method.

groups:
  unintegrated_0:
    assay_name: Gene.Expression
    norm_method: sct
  unintegrated_1:
    assay_name: Peaks
    norm_method: lsi
  unintegrated_2:
    assay_name: MACS
    norm_method: lsi
  unintegrated_3:
    assay_name: Gene.Activity
    norm_method: log

Note

It is possible to specify more analysis groups than the number of assays in your data. Do not specify analysis groups unless your experiment setup supports the condition.

For example, in the example config.yaml file, the differential analysis section, diff_analysis contains 2 group key names, unintegrated_0 and integrated_0, if you are not performing Seurat integration by setting the activate key to false in the integrate section, delete the integrated_* group in the rest of the sections. If their are superflous groups in the config.yaml, Snakemake will add extra, unwanted rules/jobs when building a DAG.

Field descriptions

Config Tables

samples field

string, default samples.tsv. Defines path to sampletable. See Samples Table for more.

Example:

samples: "config/multiome-config/samples.tsv"

# OR
# samples: "config/atac-config/samples.tsv" for scATAC-seq
# samples: "config/rna-config/samples.tsv" for scRNA-seq

aggregates field

string, default aggregates.tsv. Defines path to aggregates table. If you are using aggregated input of multiple samples created using cellranger-arc aggr (Multiome), cellranger-atac aggr (scATAC-seq), or cellranger aggr (scRNA-seq), specify the path to aggregates.tsv. Otherwise, set to an empty string (""). See Aggregates Table for more.

Example:

assays: "config/multiome-config/aggregates.tsv"

# OR
# assays: "config/atac-config/aggregates.tsv" for scATAC-seq
# assays: "config/rna-config/aggregates.tsv" for scRNA-seq

assays field

string, default assays.tsv. Defines path to assays table. If you are using custom counts matrices, specify path to assays.tsv. Otherwise, set to an empty string (""). See Assays Table for more.

Example:

assays: "config/multiome-config/assays.tsv"

# OR
# assays: "config/atac-config/assays.tsv" for scATAC-seq
# assays: "config/rna-config/assays.tsv" for scRNA-seq

Annotation

ANNOTATION field

string of "EnsDb" or "GTF", default "EnsDb". Defines the method to build an annotation object (GenomicRanges) for scATAC-seq and Multiome analyses.

  • "EnsDb" uses the EnsDb.Mmusculus.v79 (mouse mm10) or EnsDb.Hsapiens.v86 (human hg38) package in R

  • "GTF" uses a user-provided annotation file

ANNO_FILE field

string, default "path/to/genes.gtf.gz". If "GTF" is specified in the ANNOTATION field, provide the path to your annotation file (e.g. genes.gtf.gz). This field is disregarded if the ANNOTATION field is set to "EnsDb".

Quality Control (qc section)

remove_outliers field

boolean, default true. Specify whether or not to run qc rule.

rm_outliers_method field

string of "sd" or "iqr", default "sd". Detect outliers using either standard deviation ("sd"), or Tukey’s interquartile range ("iqr"). If set to "sd", the thresholds are determined based on +/- 3 standard deviations.

meta_labels field

list. Which metadata columns to use for filtering?

See Samples Table for more details about how metadata columns are detected. If a value is specified in this field, but is not present in the data, it will be disregarded during filtering.

lower field

dict. Key:value pairs of metadata column and associated lower limit for cutoff, below which exclude cells. If specified, overrides lower limit detected by outlier method for associated metadata columns in meta_labels. If null, outlier method tries to remove cells automatically

upper field

dict. Key:value pairs of metadata column and associated upper limit for cutoff, above which exclude cells. If specified, overrides lower limit detected by outlier method for associated metadata columns in meta_labels. If null, outlier method tries to remove cells automatically

Note

  • In the example below, 3 metadata columns are specified. 3 have hard cut-offs (nCount_Gene.Expression, nCount_Peaks, and TSS.enrichment), 1 detects lower outliers automatically (percent.mt).

  • 10X Genomics ATAC and Multiome kits use nuclei, so reads will not map to mitochondria. However, the workflow imputes a value of 0 for percent.mt in these assays, since missing values are not generally allowed in the underlying packages. This will not effect downstream processes such as normalization, dimensional reduction, clustering, etc.

Example:

qc:
  remove_outliers: true
  rm_outliers_method: sd
  meta_labels:
    - nCount_Gene.Expression
    - nCount_Peaks
    - percent.mt
    - TSS.enrichment
  lower:
    nCount_Gene.Expression: 100
    nCount_Peaks: 1000
    TSS.enrichment: 2
  upper: null

MACS Peak Calling (macs2 section)

MACS specific parameters.

run field

string of "Y" or "N", default "Y". Determine whether or not to run MACS. Set to "N" for RNA-seq. Set to "Y" for ATAC and Multiome requiring MACS peak calling. If you don’t run MACS, delete analysis groups where assay_name corresponds to MACS in the remaining sections/fields (e.g. unintegrated_2).

group_fragments_by field

string, default "genome". samples.tsv metadata column to generate fragments file. All labels in the specified column must have the same value. This forces generation of a single fragments file for MACS peak calling. Do not change this setting unless under special circumstances.

fasta field

string. A path to the FASTA reference genome used to map sequencing reads. Cell Ranger users can specify fasta/genome.fa in the reference directory that was used to run cellranger-atac count or cellranger-arc count.

chromsizes field

string, default "../reference/multiome.chromsizes" (Multiome) or "../reference/atac.chromsizes" (ATAC). A path to the .chromsizes file created from the FASTA reference genome.

Example:

macs2:
  run: "Y"
  group_fragments_by: genome
  fasta: "../reference/genome.fa"
  chromsizes: "../reference/multiome.chromsizes"

Normalization (normalize section)

Normalization and Principal Component Analysis (PCA).

split_by field

string. A metadata column in samples.tsv or aggregates.tsv. Datasets will be normalized, and dimensionality reduction using PCA will be performed on each dataset, split by this column. Note that Seurat integration will be performed based on the metadata column specified by split_by here and in the integrate section.

groups field

dict. Each group to perform normalization. Group name (key) must be unique. Do not modify the prefix (e.g. unintegrated and integrated) unless under special circumstances.

assay_name field

string of Gene.Expression, Multiplexing.Capture, Peaks, Gene.Activity, or MACS. Which Seurat assay to use. Note that Seurat assay names are “.” delimited.

norm_method field

string of log, sct, clr or lsi, default is the following:

  • Gene.Expression: sct

  • Peaks: lsi

  • MACS: lsi

  • Gene.Activity: log

  • protein: clr

Method to normalize the group’s assay. Normalize using Log (log), SCTransform (sct), Centered log ratio (clr) or latent semantic indexing (lsi). Typically, 5’ or 3’ Gene expression is normalized using Log or SCTransform methods, ATAC Peaks using LSI, and protein using CLR.

Example:

normalize:
  split_by: meta_geno
  groups:
    unintegrated_0:
      assay_name: Gene.Expression
      norm_method: sct
    unintegrated_1:
      assay_name: Peaks
      norm_method: lsi
    unintegrated_2:
      assay_name: MACS
      norm_method: lsi
    unintegrated_3:
      assay_name: Gene.Activity
      norm_method: log

Integration (integrate section)

Remove technical/batch effects using Seurat integration methods. Integration rule will create a new Seurat object for each integration performed.

activate field

boolean, default true. Specify whether or not to run integration.

atac_integrate_embeddings field

boolean, default true. If true, integrate low-dimensional cell embeddings (LSI coordinates) across the datasets. This is the best option for integrating multiple ATAC Peaks data sets. If false, integrate (transform) ATAC Peaks counts matrix across datasets (not LSI coordinates). This may over fit. Kept mainly for legacy support.

split_by field

string. A metadata column in samples.tsv or aggregates.tsv. Datasets are integrated based on this column. Ensure the same column is specified as in the split_by field of the normalize section above.

See Samples Table for more details about how metadata columns are detected.

groups field

dict. Each group to perform integration. Group name (key) must be unique.

assay_name field

string of Gene.Expression, Multiplexing.Capture, Peaks, Gene.Activity, or MACS. Ensure the same values are specified as in the normalize section above.

norm_method field

string of log, sct, clr or lsi. Ensure the same values are specified as in the norm_method field of the normalized section above.

integrate_method field

string of CCAIntegration, RPCAIntegration, HarmonyIntegration, FastMNNIntegration, scVIIntegration, or rlsi. Method to integrate unimodal datasets. For any datasets where norm_method is set to log or sct, this string is passed into the method argument of the IntegrateLayers function of Seurat. If the norm_method is set to lsi, set the integrate_method to rlsi to call the IntegrateEmbeddings function, as provided in Signac.

integrate_dims field

list of 2 integers, default [1, 30]. Range of dimensions to use for integration step.

Example:

integrate:
  activate: true
  atac_integrate_embeddings: true
  split_by: meta_geno # this has to match the column name of your metadata indicating datasets for integrati
  groups:
    integrated_0:
      assay_name: Gene.Expression
      norm_method: sct
      integrate_method: CCAIntegration  # RPCAIntegration, HarmonyIntegration, FastMNNIntegration, scVIIntegration
      integrate_dims:
        - 1
        - 30
    integrated_1:
      assay_name: Peaks
      norm_method: lsi
      integrate_method: rlsi
      integrate_dims:
        - 1
        - 30
    integrated_3:
      assay_name: Gene.Activity
      norm_method: log
      integrate_method: CCAIntegration
      integrate_dims:
        - 1
        - 30

Utilization of Toy Dataset (dataset_size config section)

Assign the utilization of toy dataset. Users can take advantage of this functionality for technical purposes such as debugging. If dataset size is smaller than default k values in kNN computation during integration, Seurat throws an error.

toydataset field

boolean, default false. if true, the computation is adjusted to handle toy datasets. if false, the input datasets are considered as normal datasets.

toy_k field

integer, number of neighbors used when weighting anchors. This value is passed to the k.weight argument in the IntegrateLayers function during integration.

Example:

dataset_size:
  toydataset: false
    toy_k: 10

Cluster Optimization (chooser config section)

Users can optimize clustering modularity using ChooseR with pipeline-specific modifications. This functionality is enabled only if the resolution field in the cluster section is set to null.

groups field

dict. Each group to perform clustering parameter optimization. Group name (key) must be unique. All groups in the normalize and integrate (if applicable) sections can be assigned.

npcs field

integer, default values:

  • Gene.Expression: 25

  • Peaks, MACS, or Gene.Activity: 20

The maximum number of linear reduced dimensions, computed from LSI or PCA, that are used during clustering.

resolutions field

list of integers, default [0.6, 0.8, 1, 1.2, 1.4]. Resolutions to use when bootstrapping cluster methods. Best to have a range spanning target resolution.

Warning

Specifying 1.0 instead of 1 can cause an error.

silhouette field

list of strings. default silhouette, frequency_grouped, and silhouette_grouped. Values are used during path parameter expansion in rules executing chooseR. It is advisable to not alter them.

Note

All groups values specified in the config sections: normalize and integrate (if appicable) must have a group entry in the chooser config section.

Example:

chooser:
  groups:
    unintegrated_0:
      npcs: 25
    unintegrated_1:
      npcs: 20
    unintegrated_2:
      npcs: 20
    unintegrated_3:
      npcs: 20
    integrated_0:
      npcs: 25
    integrated_1:
      npcs: 20
    integrated_2:
      npcs: 20
    integrated_3:
      npcs: 20
  resolutions:
    - 0.6
    - 0.8
    - 1
    - 1.2
    - 1.4
  silhouette:
    - silhouette
    - frequency_grouped
    - silhouette_grouped

Clustering (cluster config section)

Users can determine a specific resolution for clustering or or rely on a dataset-optimized resolution computed using chooser.

detection_method field

integer, default 3. Algorithm used for community detection during unimodal clustering. Available options are:

  • 1: original Louvain algorithm

  • 2: Louvain algorithm with multilevel refinement

  • 3: SLM algorithm

  • 4: Leiden algorithm (requires the leidenalg python)

This value is passed to the algorithm argument of the FindClusters function. Refer to Seurat Cluster Determination for more details.

resolution field

float or null, default null. If null, clustering is performed using an optimized resolution computed by chooser.

Example:

cluster:
  detection_method: 3
  resolution: null

Weighted Nearest Neighbor (weighted_nn config section)

This section configures how to perform Weighted Nearest Neighbor (WNN) analysis. WNN is similar to shared nearest neighbor (SNN), which is commonly used to build graphs for multiple modalities. WNN uses a list of weights from each specified modality, and is useful for incorporating low dimensional embeddings from multiple single cell modalities into a global reduced dimensional space.

Note

  • All cells for specified assays/groups must have identical barcodes, meaning this rule is currently suitable ONLY for multimodal data. For example 3’ Gene Expression + CRISPR barcodes (Perturb-Seq), 3’ Gene Expression + Protein barcodes (CITE-Seq), 10X Genomics Multiome (Gene Expression + ATAC), etc.

  • Disable this functionality if the input dataset is not multimodal.

activate field

boolean, default true (Multiome) or false (RNA/ATAC). Specify whether or not to run the coembed rule.

groups field

dict. Each group to perform weighted nearest neighbor analysis. Group name (key) must be unique.

input_groups field

list of strings, default:

  • wnn_0: unintegrated_0 and unintegrated_1

  • wnn_1: integrated_0 and integrated_1

groups dictionary values from normalize and integrate config sections. Remember, unless performing multimodal integration, each group value corresponds to an assay. So in our example from the normalize config section, specifying unintegrated_0 and unintegrated_1 would combine the reduced dimensional weights of Gene.Expression and Peaks during WNN clustering.

reduction field

list of strings, default pca and lsi. Dimensionality reduction method used for a specified group. In our example from the normalize config section, specifying unintegrated_0 and unintegrated_1 would look for Gene.Expression reduced dimensions in the pca slot and Peaks reduced dimensions in the lsi slot during WNN clustering.

umap_dims field

list of integers, default [[1, 25], [2, 20]]. Dimensions to use for UMAP visualization for a specified group.

resolution field

integer, default 0.6. Resolution to use during community detection for multimodal clustering.

detection_method field

integer, default 3. Algorithm used for community detection during multimodal clustering. Refer to the detection_method field in the cluster section above.

Example:

weighted_nn:
  activate: true
  groups:
    wnn_0:
      input_groups:
        - unintegrated_0 # corresponds to SCT
        - unintegrated_1 # corresponds to Peaks
      reduction:
        - pca
        - lsi
      umap_dims:
        - - 1
          - 25
        - - 1
          - 20
      resolution: 0.6
      detection_method: 3
    wnn_1:
      input_groups:
        - integrated_0 # corresponds to SCT
        - integrated_1 # corresponds to Peaks
      reduction:
        - integrated_pca
        - integrated_lsi
      umap_dims:
        - - 1
          - 25
        - - 1
          - 20
      resolution: 0.6
      detection_method: 3

Differential Testing (diff_analysis config section)

This section configures differential testing (i.e. differential gene expression, chromatin accessibility, TF motifs) using the FindAllMarkers function in Seurat.

activate field

boolean, default true. Specify whether or not to run differential testing.

groups field

dict. Each group to perform differential testing. Group name (key) must be unique.

cluster_idents field

string, default seurat_clusters. Which Seurat metadata column to use as labels for differential testing. Equivalent to obj <- SetIdents(cluster_idents) before running FindAllMarkers(obj).

assay field

string, default null. Which assay use for differential testing. This value is passed to the assay argument of the FindAllMarkers function.

slot field

string, default data. Which slot to pull data from. This value is passed to the slot argument of the FindAllMarkers function.

min_pct field

string, default null. Only test genes that are detected in a minimum fraction of cells in either of the two populations. If null, a default value of 0.01 is applied. This value is passed to the min.pct argument of the FindAllMarkers function.

test_use field

string, default null. Test used for differential testing. This value is passed to the test.use argument of the FindAllMarkers function. If null, the Wilcoxon Rank Sum test is used by default. Available methods are:

  • wilcox: Wilcoxon Rank Sum test

  • wilcox_limma: Limma implementation of the Wilcoxon Rank Sum test (Use this to reproduce results from Seurat v4)

  • bimod: Likelihood-ratio test

  • roc: ROC analysis

  • t: Student’s t-test

  • negbinom: Negative binomial generalized linear model

  • poisson: Poisson generalized linear model

  • LR: Logistic regression model

  • MAST: MAST framework.

  • DESeq2: DESeq2 framework. (requires to install DESeq2 package in R)

For more details, refer to the FindAllMarkers function in Seurat.

latent_vars field

string, default null. Variables to test, used only when test_use is one of LR, negbinom, poisson, or MAST. This value is passed to the latent.vars argument of the FindAllMarkers function.

alpha field

float, default 0.05. False discovery rate (FDR) threshold to filter significant marker genes.

Note

Only include groups values specified in the config sections: normalize, integrate (if appicable) and weighted_nn (if appicable).

Warning

For the current version of multiome-wf, LR has a bug where it grabs more nodes than allocated on a cluster node. Do not use LR on a cluster node.

Example:

diff_analysis:
  activate: true
  groups:
    unintegrated_0:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    unintegrated_1:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Peaks'
      alpha: 0.05
    unintegrated_2:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_MACS'
      alpha: 0.05
    unintegrated_3:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Gene.Activity'
      alpha: 0.05
    integrated_0:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    integrated_1:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Peaks'
      alpha: 0.05
    integrated_3:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Gene.Activity'
      alpha: 0.05
    wnn_0:
      cluster_idents: seurat_clusters
      assay: SCT
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    wnn_1:
      cluster_idents: seurat_clusters
      assay: SCT
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05

Example

A basic example of a config.yaml file using 2 Multiome batches is provided below. The analysis will be performed on all samples with and without integration, followed by clustering and differential testing. This example also includes automated optimization of clustering parameters.

See Overview of workflows for more detailed examples of config files.

samples: config/multiome-config/samples.tsv

aggregates: config/multiome-config/aggregates.tsv

assays: config/multiome-config/assays.tsv

ANNOTATION: "EnsDb"
ANNO_FILE: "path/to/genes.gtf.gz"

qc:
  remove_outliers: true
  rm_outliers_method: sd
  meta_labels:
    - nCount_Gene.Expression
    - nCount_Peaks
    - percent.mt
    - TSS.enrichment
  lower:
    nCount_Gene.Expression: 100
    nCount_Peaks: 1000
    TSS.enrichment: 2
  upper: null

macs2:
  run: "Y"
  group_fragments_by: genome
  fasta: "../reference/genome.fa"
  chromsizes: "../reference/multiome.chromsizes"

normalize:
  split_by: meta_geno
  groups:
    unintegrated_0:
      assay_name: Gene.Expression
      norm_method: sct
    unintegrated_1:
      assay_name: Peaks
      norm_method: lsi
    unintegrated_2:
      assay_name: MACS
      norm_method: lsi
    unintegrated_3:
      assay_name: Gene.Activity
      norm_method: log

integrate:
  activate: true
  atac_integrate_embeddings: true
  split_by: meta_geno
  groups:
    integrated_0:
      assay_name: Gene.Expression
      norm_method: sct
      integrate_method: CCAIntegration
      integrate_dims:
        - 1
        - 30
    integrated_1:
      assay_name: Peaks
      norm_method: lsi
      integrate_method: rlsi
      integrate_dims:
        - 1
        - 30
    integrated_3:
      assay_name: Gene.Activity
      norm_method: log
      integrate_method: CCAIntegration
      integrate_dims:
        - 1
        - 30

dataset_size:
  toydataset: false
  toy_k: 10

chooser:
  groups:
    unintegrated_0:
      npcs: 25
    unintegrated_1:
      npcs: 20
    unintegrated_2:
      npcs: 20
    unintegrated_3:
      npcs: 20
    integrated_0:
      npcs: 25
    integrated_1:
      npcs: 20
    integrated_2:
      npcs: 20
    integrated_3:
      npcs: 20
  resolutions:
    - 0.6
    - 0.8
    - 1
    - 1.2
    - 1.4
  silhouette:
    - silhouette
    - frequency_grouped
    - silhouette_grouped

cluster:
  detection_method: 3
  resolution: null

weighted_nn:
  activate: true
  groups:
    wnn_0:
      input_groups:
        - unintegrated_0 # corresponds to SCT
        - unintegrated_1 # corresponds to Peaks
      reduction:
        - pca
        - lsi
      umap_dims:
        - - 1
          - 25
        - - 1
          - 20
      resolution: 0.6
      detection_method: 3
    wnn_1:
      input_groups:
        - integrated_0 # corresponds to SCT
        - integrated_1 # corresponds to Peaks
      reduction:
        - integrated_pca
        - integrated_lsi
      umap_dims:
        - - 1
          - 25
        - - 1
          - 20
      resolution: 0.6
      detection_method: 3

diff_analysis:
  activate: true
  groups:
    unintegrated_0:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    unintegrated_1:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Peaks'
      alpha: 0.05
    unintegrated_2:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_MACS'
      alpha: 0.05
    unintegrated_3:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Gene.Activity'
      alpha: 0.05
    integrated_0:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    integrated_1:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Peaks'
      alpha: 0.05
    integrated_3:
      cluster_idents: seurat_clusters
      assay: null
      slot: data
      min_pct: 0.2
      test_use: null
      latent_vars: 'nCount_Gene.Activity'
      alpha: 0.05
    wnn_0:
      cluster_idents: seurat_clusters
      assay: SCT
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05
    wnn_1:
      cluster_idents: seurat_clusters
      assay: SCT
      slot: data
      min_pct: null
      test_use: null
      latent_vars: null
      alpha: 0.05