Skip to content

Rule dge2sdge:

Purpose

The dge2sdge generates a spatial digital gene expression matrix (SGE) using the digital gene expression matrix (DGE) from alignment and spatial maps.

Input Files

  • Per-Chip Spatial Barcode Map and Manifest File Required input files include the spatial barcode map and manifest file for the chip of interest, which are created by the sbcd2chip rule.
  • Per-Chip Matched Spatial Barcode Files It also requires matched spatial barcode files that contains spatial barcodes matched to the 2nd-seq reads. Those files are generated by the smatch rule.
  • DGEs DGEs for each genomic feature, including Gene, GeneFull, splice junctions (SJ), and Velocyto, are produced by the align rule.

Output Files

The rule generates the following output in the specified directory path:

1
<output_directory>/align/<flowcell_id>/<chip_id>/<run_id>/sge

(1) Spatial Digital Gene Expression Matrix (SGE)

Description: A spatial digital gene expression matrix (SGE) in 10x Genomics format is generated, which contains all available genomic feature.

File Naming Convention: The SGE is composed of barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz.

File Format:

  • barcodes.tsv.gz:

    1
    2
    3
    AAAAAAGGTACCCGCAGTGCGGACAAACGA  1   1   1   1   1214343 1498113 1,1,1,0,0
    AAAACAGGAGATTCAGAATGCAAAAATGAA  2   2   1   1   1029766 1669474 0,1,0,1,0
    AAAACTTGTCGAGCTCAGTGACGCGGGCTT  3   3   1   1   1366819 1170486 2,2,1,0,1
    

    • Column 1: sorted spatial barcodes
    • Column 2: 1-based integer index of the spatial barcode
    • Column 3: 1-based integer index from the full barcode that is in the STARsolo output
    • Column 4: Lane ID, which is defined as 1.
    • Column 5: Tile ID, which is defined as 1.
    • Column 6: X-coordinate within the chip (global X-coordinate).
    • Column 7: Y-coordinate within the chip (global Y-coordinate).
    • Column 8: Five comma-separated numbers denote the count per spatial barcode for each genomic feature, in the order of Gene, GeneFull, Spliced, Unspliced, and Ambiguous.
  • features.tsv.gz:

    1
    2
    3
    ENSMUSG00000100764  Gm29155 1   2,2,1,0,1
    ENSMUSG00000100635  Gm29157 2   0,0,0,0,0
    ENSMUSG00000100480  Gm29156 3   0,0,0,0,0
    

    • Column 1: Gene Ensemble ID
    • Column 2: Gene symbol
    • Column 3: 1-based integer index which will be used in matrix.mtx.gz
    • Column 4: Five comma-separated numbers denote the count per gene for each genomic feature, in the order of Gene, GeneFull, Spliced, Unspliced, and Ambiguous.
  • matrix.mtx.gz:

    1
    2
    3
    4
    5
    6
    %%MatrixMarket matrix coordinate integer general
    %
    33989 1197304 2488321
    5743 1 1 1 1 0 0
    6002 2 0 1 0 1 0
    7279 3 1 1 1 0 0
    

    • Header: Initial lines form the header, declaring the matrix's adherence to the Market Matrix (MTX) format, outlining its traits. This may include comments (lines beginning with %) for extra metadata, all marked by a “%”.
    • Dimensions: Following the header, the first line details the matrix dimensions: the count of rows (features), columns (barcodes), and non-zero entries.
    • Data Entries: Post-dimensions, subsequent lines enumerate non-zero entries in seven columns: row index (feature index), column index (barcode index), and five values (expression levels) corresponds to Gene, GeneFull, Spliced, Unspliced, and Ambiguous.

(2) A Comprehensive View of "sbcd", "smatch", and "sge" Images

Description: A side-by-side presentation of three images: the "sbcd" image from sbcd2chip, the "smatch" image from smatch, and the "sge" image, which shows the distribution of spatial barcodes aligned to the reference genome, generated by the current rule.

File Naming Convention:

1
<run_id>.sge_match_sbcd.png

File Visualization:

sge_match_sbcd_image

(3) A Metadata File for X Y Coordinates

Description: This file contains the minimum and maximum X Y coordinates, which are essential for the reformatting features.

File Naming Convention:

1
barcodes.minmax.tsv

File Format:

1
2
xmin  xmax      ymin  ymax
0     12810899  0     6950609

  • xmin: The minimum x-coordinate in nanometers across all barcodes in the SGE.
  • xmax: The maximum x-coordinate in nanometers across all barcodes in the SGE.
  • ymin: The minimum y-coordinate in nanometers across all barcodes in the SGE.
  • ymax: The maximum y-coordinate in nanometers across all barcodes in the SGE.

Output Guidelines

It is suggested to review the composite image displaying "sbcd", "smatch", and "sge" images together, to confirm there is coherence among the three images.

Parameters

1
2
3
upstream:
  dge2sdge:
    layout: null
  • The layout Parameter The applies as the layout for the RGB plots. When absent, NovaScope use the predefined layout file.

Dependencies

Given the input from Rule sbcd2chip, smatch, and align serve as the necessary input for dge2sdge. This linkage ensures Rule dge2sdge can only execute after sbcd2chip, smatch, and align have successfully completed their operations. See an overview of the rule dependencies in the Workflow Structure.

Code Snippet

The code for this rule is provided in a05_dge2sdge.smk.