Setting up a Environment YAML File¶
NovaScope requires a YAML file to configure the environment. This environment configuration file (config_env.yaml
) is used to specify the paths to the required tools, reference databases, and Python environment.
Below is a brief description of all the items in the YAML file.
Tip
To create your own config_env.yaml
file for the environment setup, you may copy from our example available in our GitHub repository. Remember to replace the placeholders with your specific input variables to customize it according to your needs.
Tools¶
The pipeline automatically detects and includes undefined tools in the system path, allowing for their use without manual configuration.
1 2 3 4 5 |
|
samtools
For users in High-Performance Computing (HPC) environments with samtools
installed, it's feasible to use envmodules
(see Environment Modules) to load samtools
rather than defining its path here.
(Optional) Environment Modules¶
Info
Only applicable to HPC environments. For local executions, remove this section from config_env.yaml
.
For HPC users, it is feasible to use the envmodules
section to load the required software tools as modules. If a tool is not listed in the envmodules
section, the pipeline will assume it's installed system-wide.
Tip
The version information is required.
1 2 3 4 5 6 7 |
|
python
If your Python environment was set up using a Python accessed through a module, specify the same Python module in the envmodules section to maintain the environment. If using a local Python installation (not through module load
), DO NOT INCLUDE any Python module here.
samtools
Using envmodules
to load samtools
can be an alternative to specifying its path in tools
. The given example is designed for instances where samtools
is integrated into the Bioinformatics
module system, which necessitates loading the Bioinformatics
module prior to loading samtools
. In this case, provide all modules that required to be loaded in the correct order, joint by &&
.
Reference Databases¶
Specify all reference databases required for the input species in the ref
field.
(1) Reference Genome Index for Alignment¶
Use the align
parameter to define the reference genome index for alignment using STAR. These reference genome indices can be downloaded from the cellranger download page. Users can also generate their own reference genome index; detailed instructions for building the STAR index from a reference file are provided in the Requirements section.
(2) Reference Gene List Directories for Visualizing Spatial Expression Patterns¶
The genelists
parameter identifies the directory containing gene lists specific to the species of the input data. These gene lists are used to visualize spatial expression patterns of particular gene groups within Rule sdge_visual.
The directory should contain gene list files, each corresponding to a specific type or group of genes, such as mitochondrial (MT)genes. These files should be named <gene_group>.genes.tsv
, for example, MT.genes.tsv
, and each file should list gene names, with one name per line.
NovaScope provides precompiled gene lists for mouse (version: mm39) and human (version: hg38) available in the "info" folder. If the genelists
parameter is not specified in the config_env.yaml
, NovaScope defaults to using these files. Alternatively, users may provide their own custom gene list files.
(3) (Optionl) Reference Gene Information for Gene Filtering¶
This parameter is required only if additional reformatting features of NovaScope are utilized. The geneinfo
parameter specifies the location of gene information files used for gene filtering. These files are available from FICTURE in its info directory. If the geneinfo
field in the ref
section is omitted, but the tools
field specifies the path to FICTURE, NovaScope will automatically use the gene information files from FICTURE. Users may also use their custom gene information files in this process.
Tip
Please ensure the reference files correspond to the species of your input data.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Python Environment¶
Specify the path of Python virtual environment by modifying the following line:
1 |
|
(Optional) Computing Capabilities¶
Info
Only applicable to HPC environments.
NovaScope provides two methods for specifying resources for the alignment process:
- Option
stdin
allows users to define resources manually in the job configuration file. - Option
filesize
allows NovaScope to automatically allocate resources based on the size of the input files and the available computational resources defined in this environment configuration file. ONLY when using Optionfilesize
must users specify the computing resources available.
For more information on activating Option stdin
or filesize
and the resource allocation strategy for Option filesize
, visit the Job Configuration page.
An example of how to configure these settings.
1 2 3 4 5 6 7 |
|