Running with Your Own Data

If you want to prepare your own dataset to run the pipeline with Docker or Singularity, the process is similar to the examples provided, except that you will need to modify the --configfile parameter to your own version.

Preparing the job config file¶

Typically, you may want to locate your config file in your working directory and specify it in the command. For example, if your config file is located at ${working_dir}/config_job.yaml, assuming that the working directory is mounted to /data in the container as given in the example, you can run the pipeline with --configfile /data/config_job.yaml argument.

The full instruction on how to prepare your job config file is provided in the Job Configuration section. If you are using NovaScope using Docker or Singularity container, here are some tips to modify your job config file:

It would be easiest to start with the Shallow Liver Section Configuration File.
We recommend placing the input FASTQ files in the ${working_dir}/input directory and the output files in the ${working_dir}/output directory, and mount the ${working_dir} to /data in the container.
Modify env_yml to the config file that already exists in the container: /app/novascope/info/config_env_docker.yaml, so that you do not have to set up your own environment file.
Please see Job Configuration section to understand how to update the rest of the input parameters.

Running the Docker/Singularity container¶

You may perform a dry-run to test whether the NovaScope pipeline with your own data is working properly.

For example, if you are running a Docker container,

## Test the NovaScope pipeline with dry-run
## NOTE: make your to replace /path/to/working/dir/ with your working directory
docker run -it --rm -v /path/to/working/dir:/data hyunminkang/novascope \
    -s /app/novascope/NovaScope.smk \
    --rerun-incomplete -d data/output \
    --configfile /data/config_job.yaml \
    --dry-run -p

If you are running a Singularity container,

## Test the NovaScope pipeline with dry-run
## NOTE: make your to replace /path/to/working/dir/ with your working directory
singularity exec --bind /path/to/working/dir:/data novascope_latest.sif \
    snakemake -s /app/novascope/NovaScope.smk \
    --rerun-incomplete -d data/output \
    --configfile /data/config_job.yaml \
    --dry-run -p

If the dry-run is successful, you may run the full pipeline by substituting --dry-run with --cores [num-cpus]

If your data contains human samples, you may need to download the GRCh38 reference files. You can download the reference files by running the following commands:

## NOTE: make your to replace /path/to/working/dir/ with your working directory
cd /path/to/working/dir/
wget https://zenodo.org/records/11181586/files/GRCh38_star_2_7_11b.tar.gz
tar xzvf GRCh38_star_2_7_11b.tar.gz