.. _tutorial: How to get started ================== Step 1: Installing `multiome-wf` -------------------------------- The `multiome-wf` requires the Snakemake pipeline to be cloned to your local system, along with necessary packages installed in the conda environment. Follow the instructions in the :ref:`setup` page. If you are new to `multiome-wf`, please review the file system described in the :ref:`pipeline-structure` before starting configuration. Step 2: Configuring sample tables --------------------------------- To use input files created per sample, complete the ``samples.tsv`` in your configuration folder. If you are using input files where all samples have been aggregated, fill out the ``aggregates.tsv`` in addition to ``samples.tsv``. Refer to :ref:`config-general` for the hierarchy of configuration directories and files. .. note:: Note that `multiome-wf` organizes sample information by modality. Any sample tables not related to your analysis modality will be disregarded. For example, the sample table for Multiome should be placed in the ``multiome-config`` directory. Step 3: Configuring Snakemake pipeline -------------------------------------- The `multiome-wf` is designed to use the ``config.yaml`` file in your configuration folder. Modify this file according to the instructions in the :ref:`config-yaml`. .. note:: `multiome-wf` assumes that your ``config.yaml`` and sample tables are located in the same configuration directory. Any ``config.yaml`` files not related to your analysis modality will be disregarded. Step 4: Updating ``Snakefile`` ------------------------------ Once you have finished configuring your sample tables and ``config.yaml``, edit your ``Snakefile`` accordingly. First, point the ``configfile`` to your ``config.yaml`` file, as shown below: .. code-block:: python configfile: "config/multiome-config/config.yaml" Depending on the computation resources required for each rule, update the ``resources`` directive, as shown below: .. code-block:: python resources: mem_mb=1024 * 100, disk_mb=1024 * 50, runtime=60 * 12 .. note:: If it is difficult to accurately estimate the required resources, you can update them after encoutering out-of-resource errors during the run. .. _configure-hpc: Step 5 (Optional): Configuring high performance computing --------------------------------------------------------- Optionally, users can run the `multiome-wf` on a High Performance Computing (HPC) cluster using Snakemake's cluster execution feature. Refer to the :ref:`cluster` for more details. Once you have finished configuring your sample tables, ``config.yaml``, and updated your ``Snakefile`` accordingly, set the ``--configfile`` parameter to the path of your ``config.yaml`` file in the ``WRAPPER_SLURM`` file, as shown below: .. code-block:: bash # Run snakemake ( time snakemake \ -p \ --directory $PWD \ -k \ --restart-times 3 \ --rerun-incomplete \ --rerun-triggers mtime \ --jobname "s.{rulename}.{jobid}.sh" \ -j 999 \ --use-conda \ --configfile config/multiome-config/config.yaml \ # UPDATE THIS PARAMETER! $PROFILE_CMD \ --latency-wait=300 \ --max-jobs-per-second 1 \ --max-status-checks-per-second 0.01 \ "$@" ) > "Snakefile.log" 2>&1 .. note:: Note that the default configuration of `multiome-wf` is optimized for `NIH's Biowulf `_ and may not be suitable for other HPC systems. Users may need additional configuration, including modifications to the ``WRAPPER_SLURM``, depending on their computing environment. Step 6: Run ----------- Once everything is ready, users can run the workflow with or without cluster computing. Assuming you are in the same directory where your ``Snakefile`` is located, use the following command in your terminal to run without submitting a cluster job: .. code-block:: bash # Activate conda environment conda activate ../env # Run Snakemake snakemake --core 8 If you wish to run `multiome-wf` on a cluster and you have configured the pipeline for cluster computing, activate your conda environment and use the following command in the terminal: .. code-block:: bash # Run Snakemake by submitting jobs sbatch WRAPPER_SLURM .. note:: The above command is optimized for job submission on NIH's Biowulf. Consult your HPC documentation or administrator for guidance on running Snakemake on a cluster, if necessary. Step 7: Results --------------- The `multiome-wf` creates a ``results`` folder in the ``workflow`` directory upon completion of a run. All output files are saved in the ``results`` folder. Refer to the :ref:`overview-output` for more information.