Expected Output from NovaScope¶
Output Directory Structure¶
The directory passed through output paramter in the config_job.yaml will be organized as follows,
1 2 3 4 5 | |
seq1st¶
The seq1st directory is structured for organizing 1st sequencing FASTQ files and spatial barcode maps. It includes:
- A
fastqssubdirectory for all input 1st sequencing FASTQ files via symlink. - Two subdirectories for spatial barcode maps:
sbcdsfor maps of individual tiles from the 1st sequencing,nbcdsfor a map organized on a per-chip basis, used in later processing.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
seq2nd¶
The seq2nd directory is dedicated to managing all input 2nd sequencing FASTQ files via symlinks. Each pair will be organized in one folder named by the 2nd sequencing ID provided via the job configuration file.
The following example demonstrates the directory structure using two pairs of input 2nd sequencing FASTQ files:
1 2 3 4 5 6 7 | |
match¶
The match directory houses the outcomes of aligning second sequencing reads with spatial barcodes for the corresponding chip section.
1 2 3 4 5 6 7 8 | |
histology¶
The histology directory is designated for holding both the input histology file and the histology images aligned with the spatial coordinates of the SGE.
1 2 3 4 5 6 7 | |
align¶
The align directory encompasses several subdirectories, including:
bamfor alignment outcomes such as the BAM file, summary metrics, and visualizations;sgefor a spatial gene expression (SGE) matrix and visualizations;
1 2 3 4 5 6 7 8 | |
analysis¶
The analysis directory includes three subdirectory mainly for the reformatting SGE matrix:
sgeARfor the SGE matrix before reformatting, where the "AR" stands for analysis-ready,preprocessfor the reformatted and filtered SGE matrices, filtered feature file, and meta files for coordinates,segmentfor the hexagon-indexed SGE.
1 2 3 4 5 6 7 8 9 | |
The sgeAR Subfolder and Manual Preprocess
The sgeAR subfolder is specifically designed to host input SGE matrix that require reformatting. This subfolder is particularly useful when users wish to manually preprocess SGE, such as applying boundary filtering, before they undergo reformatting.
To manually preprocess an SGE matrix:
- Preprocess the SGE matrix: Users must manually preprocess the SGE matrix according to their specific requirements.
- Name the dataset: After preprocessing, the dataset should be named and referred to as
unit_id. - Save the preprocessed SGE matrix: Place the manually preprocessed SGE matrix in the
sgeARsubfolder. - Preprare a coordinate meta file Prepare a
barcodes.minmax.tsvwith the minimum and maximum of X and Y coordinates in thesgeARsubfolder. - Update the job configuration file: Provide the
unit_idin the job configuration file to ensure it is recognized in subsequent processing steps.
Automatic Handling:
If reformatting features are requested without manually preparing the SGE matrix in the sgeAR as outlined, NovaScope will automatically generate a unit_id. It will then link the original SGE matrix from the sge subdirectory to the sgeAR, facilitating seamless processing.
Downstream Analysis¶
The aligned sequenced reads can be directly used for tasks that require read-level information, such as allele-specific expression or somatic variant analysis. The SGE can also be analyzed with many software tools, such as Latent Dirichlet Allocation (LDA) and Seurat.
An exemplary downstream analysis is provided at NovaScope-exemplary-downstream-analysis.