Rule dge2sdge
:¶
Purpose¶
The dge2sdge
generates a spatial digital gene expression (SGE) matrix using the digital gene expression matrix (DGE) from alignment and spatial maps.
Input Files¶
- Per-Chip Spatial Barcode Map and Manifest File
Required input files include the spatial barcode map and manifest file for the chip of interest, which are created by the
sbcd2chip
rule. - Per-Chip Matched Spatial Barcode Files
It also requires matched spatial barcode files that contains spatial barcodes matched to the 2nd-seq reads. Those files are generated by the
smatch
rule. - DGEs
DGEs for each genomic feature, including Gene, GeneFull, splice junctions (SJ), and Velocyto, are produced by the
align
rule.
Output Files¶
The rule generates the following output in the specified directory path:
1 |
|
(1) A Spatial Digital Gene Expression (SGE) Matrix¶
Description: A transcript-indexed SGE in 10x Genomics format is generated, which contains all available genomic feature.
File Naming Convention: The SGE is composed of barcodes.tsv.gz
, features.tsv.gz
, and matrix.mtx.gz
.
File Format:
-
barcodes.tsv.gz
:1 2 3
AAAAAAGGTACCCGCAGTGCGGACAAACGA 1 1 1 1 1214343 1498113 1,1,1,0,0 AAAACAGGAGATTCAGAATGCAAAAATGAA 2 2 1 1 1029766 1669474 0,1,0,1,0 AAAACTTGTCGAGCTCAGTGACGCGGGCTT 3 3 1 1 1366819 1170486 2,2,1,0,1
- Column 1: sorted spatial barcodes
- Column 2: 1-based integer index of the spatial barcode
- Column 3: 1-based integer index from the full barcode that is in the STARsolo output
- Column 4: Lane ID, which is defined as
1
. - Column 5: Tile ID, which is defined as
1
. - Column 6: X-coordinate within the chip (global X-coordinate).
- Column 7: Y-coordinate within the chip (global Y-coordinate).
- Column 8: Five comma-separated numbers denote the count per spatial barcode for each genomic feature, in the order of Gene, GeneFull, Spliced, Unspliced, and Ambiguous.
-
features.tsv.gz
:1 2 3
ENSMUSG00000100764 Gm29155 1 2,2,1,0,1 ENSMUSG00000100635 Gm29157 2 0,0,0,0,0 ENSMUSG00000100480 Gm29156 3 0,0,0,0,0
- Column 1: Gene Ensemble ID
- Column 2: Gene symbol
- Column 3: 1-based integer index which will be used in matrix.mtx.gz
- Column 4: Five comma-separated numbers denote the count per gene for each genomic feature, in the order of Gene, GeneFull, Spliced, Unspliced, and Ambiguous.
-
matrix.mtx.gz
:1 2 3 4 5 6
%%MatrixMarket matrix coordinate integer general % 33989 1197304 2488321 5743 1 1 1 1 0 0 6002 2 0 1 0 1 0 7279 3 1 1 1 0 0
Header
: Initial lines form the header, declaring the matrix's adherence to the Market Matrix (MTX) format, outlining its traits. This may include comments (lines beginning with%
) for extra metadata, all marked by a “%”.Dimensions
: Following the header, the first line details the matrix dimensions: the count of rows (features), columns (barcodes), and non-zero entries.Data Entries
: Post-dimensions, subsequent lines enumerate non-zero entries in seven columns: row index (feature index), column index (barcode index), and five values (expression levels) corresponds to Gene, GeneFull, Spliced, Unspliced, and Ambiguous.
(2) A Comprehensive View of Spatial Barcodes Distribution¶
Description: A side-by-side presentation of three sets of barcodes, including: all spatial barcodes from the spatial map (see Rule sbcd2chip
), matched spatial barcodes (see Rule smatch
), and aligned spatial barcodes.
File Naming Convention:
1 |
|
File Visualization:
(3) A Metadata File for X Y Coordinates¶
Description: This file contains the minimum and maximum X Y coordinates, which are essential for the reformatting features.
File Naming Convention:
1 |
|
File Format:
1 2 |
|
xmin
: The minimum x-coordinate across all barcodes in the SGE.xmax
: The maximum x-coordinate across all barcodes in the SGE.ymin
: The minimum y-coordinate across all barcodes in the SGE.ymax
: The maximum y-coordinate across all barcodes in the SGE.
Output Guidelines¶
It is suggested to review the composite image displaying "sbcd", "smatch", and "sge" images together, to confirm there is coherence among the three images.
Parameters¶
1 2 3 |
|
- The
layout
Parameter The applies as the layout for the RGB plots. When absent, NovaScope use the predefined layout file.
Dependencies¶
Given the input from Rule sbcd2chip
, smatch
, and align
serve as the necessary input for dge2sdge
. This linkage ensures Rule dge2sdge
can only execute after sbcd2chip
, smatch
, and align
have successfully completed their operations. See an overview of the rule dependencies in the Workflow Structure.
Code Snippet¶
The code for this rule is provided in a05_dge2sdge.smk
.