Executing the NovaScope Pipeline¶
Preliminary Steps¶
Tip
Before running the full pipeline, performing a sanity check by executing a dry run is highly recommended. A dry run verifies that your config_job.yaml
is properly configured and outlines the necessary jobs to be executed.
Tip
Additionally, you can create a rule graph that visually represents the structure of the workflow or a Directed Acyclic Graph (DAG) to view all jobs and their actual dependency structure.
Below provides commands for a dry-run and visualization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Execution Options¶
Below we applied --rerun-incomplete
, which enables the pipeline to re-run any jobs the output of which is identified as incomplete, and --latency-wait
, which request the pipeline pauses for the defined time awaiting an output file if not instantly accessible after a job, compensating for filesystem delay. Please note those options are OPTIONAL. For more options, please see the A Rule Execution Guide and the official Snakemake documentation.
Option A: Local Execution¶
If your computing environment does not require a job scheduler such as Slurm, you can run the pipeline locally. An example script is provided below. Make sure to replace the variables to relevant paths, the number of cores, and the time to wait for latency.
1 2 3 4 5 6 7 8 9 10 |
|
See the following examples to see how to execute the pipeline locally:
Option B: SLURM using a Master Job¶
Tip
If your computing environment support a job scheduler such Slurm, a recommended approach to submit a 'Master Job' that oversees and manage the status of all other jobs.
First, make sure you have the Slurm configuration file available. The --latency-wait
and --rerun-incomplete
options are preset in the example Slurm configuration file, eliminating the need for manual specification.
Now you need to establish the master job, of which the role is to monitor the progress of all tasks and handle job submissions. Create a file similar to the information below. Note that the details of the contents may vary based on your specific computing environment.
Warning
The master job requires minimal memory but an extended time limit to ensure all related jobs are submitted and completed. Otherwise, NovaScope will exit and unfinished jobs will not be executed or tracked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Specific examples prepared for the three datasets are provided below:
Then submit the master job through sbatch
:
1 |
|
Option C: SLURM via Command Lines¶
For a small number of quick jobs, you can execute NovaScope with Slurm using a single command line without a master job.
This is similar to the local execution, but you need to specify the Slurm profile. Ensure the slurm configuration file is ready before proceeding. The --latency-wait
and --rerun-incomplete
options are pre-configured in the example slurm file.
Warning
It is important to remember that if you are logged out before all jobs have been submitted to Slurm, any remaining jobs, i.e., those haven't been submitted, will not be submitted.
1 2 3 4 5 6 7 |
|