Skip to content

spatula dge2sge

Summary

spatula dge2sge creates the STARsolo output of digital expression, joins the spatial barcode map, and produces the spatial gene expression (SGE) matrix.

Here is a summary of a typical use case:

  • Input: Takes (a) a spatial barcode map of matching barcodes produced from match-sbcds) and (b) 10X-format digital expression matrix (with barcodes.tsv.gz, feature.tsv.gz, and one or more matrix.mtx.gz).
  • Output: Produces spatially coordinated gene expression (SGE) matrix, which is similar to the 10X format, but (a) with additional information such as spatial coordinates, and (b) with the ability to annotate multiple types of readout in a single output matrix.

An example usage of the tool is as follows:

spatula dge2sge  --match /path/to/input.match.sorted.uniq.tsv.gz \
                 --bcd /path/to/STARsolo.out/GeneFull/raw/barcodes.tsv.gz \
                 --ftr /path/to/STARsolo.out/GeneFull/raw/features.tsv.gz \
                 --mtx /path/to/STARsolo/out/Gene/raw/matrix.mtx.gz \
                 --mtx /path/to/STARsolo.out/GeneFull/raw/matrix.mtx.gz \
                 --mtx /path/to/STARsolo.out/Velocyto/raw/spliced.mtx.gz \
                 --mtx /path/to/STARsolo.out/Velocyto/raw/unspliced.mtx.gz \
                 --mtx /path/to/STARsolo.out/Velocyto/raw/ambiguous.mtx.gz \
                 --out /path/to/output/dir/

See below for a more detailed usage description.

Required options

  • --match : The path to the spatial barcode map (*.match.sorted.uniq.tsv.gz) file that contains matching barcodes to the 2nd-seq FASTQ file, generated by match-sbcds. Instead of --match, the full sbcd directory can be used with --sbcd option (see Additional options for details).
  • --bcd : The path to the barcodes.tsv.gz file that contains the full list of barcodes from the STARsolo output. Each line contains a column of barcode sequence, following the Market Exchange format
  • --ftr : The path to the features.tsv.gz file that contains the full list of genes (or features) from the STARsolo output. Each line contains three columns: gene_id, gene_name, and feature_type, following the Market Exchange format
  • --mtx : (allows multiple entries) The path to the matrix.mtx.gz file that contains the digital expression matrix from the STARsolo output. After three header lines, the matrix file contains (a) barcode index (1-based), (b) gene index (1-based), and © count, the Market Exchange format. Multiple --mtx options can be used as long as they share the same barcodes and feature information.
  • --out : Output directory that stores the spatial gene expression (SGE) output files. See Expected Output for more details.

Additional Options

  • --sbcd : Directory that contains the spatial barcode dictionary per tile. This can be used instead of --match option, if the list of matching barcodes are unavailable.

Expected Output

In the output directory [outdir], the following files will be created.

  • [outdir]/barcodes.tsv.gz contains the list of barcodes in the SGE matrix. Each line contains the following information:
    1. Barcode sequence
    2. Increasing index of the barcode (1-based), which is not necessarily contiguous.
    3. Sequential and contiguous index of the barcode (1-based). This should match to the integer IDs of barcodes in the matrix.mtx.gz file.
    4. Lane of spatial coordinate
    5. Tile of spatial coordinate
    6. X-coordinate of spatial coordinate
    7. Y-coordinate of spatial coordinate
    8. Comma-separated counts of observations in each matrix.mtx.gz file. The order of the counts should match the order of --mtx options.
  • [outdir]/features.tsv.gz contains the list of genes in the SGE matrix. Each line contains the following information:
    1. Gene ID (unique identifier)
    2. Gene name
    3. Sequential and contiguous index of the gene (1-based). This should match to the integer IDs of genes in the matrix.mtx.gz file.
    4. Comma-separated counts of observations in each matrix.mtx.gz file. The order of the counts should match the order of --mtx options.
  • [outdir]/matrix.mtx.gz contains the spatial expression matrix in the SGE format. After three header lines following the Market Exchange format, the matrix file contains the following information, separated by spaces.
    1. barcode index (1-based)
    2. gene index (1-based)
    3. (multiple space-separated entries) counts of observations, in the order of input files in --mtx options.
  • [outdir]/[lane] directory contains barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz, for the specific lane, in the same format.
  • [outdir]/[lane]/[tile] directory contains barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz, for the specific tile, in the same format.

Full Usage

The full usage of spatula dge2sdge can be viewed with the --help option:

$  ./spatula dge2sge --help   
[./spatula dge2sge] -- Convert DGE (from STARsolo) into SGE format

 Copyright (c) 2022-2024 by Hyun Min Kang
 Licensed under the Apache License v2.0 http://www.apache.org/licenses/

Detailed instructions of parameters are available. Ones with "[]" are in effect:

Available Options:

== Input options ==
   --sbcd  [STR: ]             : Spatial barcode dictionary generated from 'build-sbcds' command
   --match [V_STR: ]           : List of spatial barcode files that were used for whitelist generation
   --bcd   [STR: ]             : Shared barcode file path (e.g. barcodes.tsv.gz)
   --ftr   [STR: ]             : Shared feature file path (e.g. feature.tsv.gz)
   --mtx   [V_STR: ]           : Shared matrix file path (e.g. matrix.mtx.gz)

== Output Options ==
   --out   [STR: ]             : Output directory


NOTES:
When --help was included in the argument. The program prints the help message but do not actually run