Skip to content

spatula combine-sbcds

Summary

spatula combine-sbcds combines spatial barcodes across multiple tiles based on a specified layout of tiles. This tool can be used to generate a spatial barcode map for each 'Chip', a physical unit of spatial transcriptomic experiment, in Seq-Scope platform.

Here is a summary of the main features:

  • Input: Takes (a) a spatial barcode directory from build-sbcds with --sbcd, (b) a manifest file summarizing the files in build-sbcds with --manifest, and © a tile layout for the specified Chip with --layout. Additional options may be used.
  • Output: Produces a combined map of spatial barcodes in the output directory, containing a single sorted spatial barcode in tsv.gz format with manifest.tsv file. The output spatial coordinates are converted to nanometer scale.

An example use of the tool is as follows:

spatula combine-sbcds --layout /path/to/layout.tsv \
                      --manifest /path/to/manifest.tsv \
                      --sbcd /path/to/sbcd/dir/ \
                      --out /path/to/output/dir/ \
                      --rowgap 0.0517 --colgap --0.0048 \
                      --max-dup 1 --max-dup-dist-nm 1
See below for a more detailed usage description.

Required options

  • --sbcd : The path to the directory containing spatial barcode files, generated by build-sbcds.
  • --manifest : The manifest.tsv file containing the summary of each file, generated by build-sbcds.
  • --layout : The layout file specifying the spatial arrangement of the tiles to be combined. (If --layout is not available, an --offset file can be specified instead). The layout file must contain the following columns:
    • lane : The lane number of the tile (can be replaced by the id column)
    • tile : The tile number of the tile (can be replaced by theid column)
    • row : The row number of the tile
    • col : The column number of the tile
    • rowshift : (Optional) Vertical shift of the row coordinate in proportion to the tile height.
    • colshift : (Optional) Horizontal shift of the column coordinate in proportion to the tile width.
    • id : (Optional) The ID of each tile, in the format of lane_tile can be used instead of the lane and tile columns.
  • --out : The output directory that stores the combined spatial barcode file and a manifest file. See Expected Output for more details.

Additional options

  • --offset : The offset file can be used instead of --layout to specify the spatial arrangement of tiles. If the spatial arrangement between tiles is more complicated than a grid-like arrangement, the offset tile can handle a more general arrangement format. The offset file should contain the following columns:
    • lane : The lane number of the tile (can be replaced by the id column)
    • tile : The tile number of the tile (can be replaced by the id column)
    • offset_x : The offset for the x-coordinate. This is an offset to add to the x-coordinate of the tile in the scale of (original) pixels per tile.
    • offset_y : The offset for the y-coordinate. This is an offset to add to the y-coordinate of the tile in the scale of (original) pixels per tile.
    • id : (Optional) The ID of each tile, in the format of lane_tile can be used instead of the lane and tile columns.
  • --pixel-to-nm : The resolution to pixel coordinates to nanometer scale. The default is 34.78 nm/pixel, which is the resolution of the Seq-Scope platform with Illumina Nova-seq 6000. For Seq-Scope platform with Illumina HiSeq 2500, the resolution is 37.5 nm/pixel.
  • --rowgap : The additional gap between rows, proportional to the maximum height of the tiles. This is used to adjust the spatial coordinates of the tiles. The default value is 0.0.
  • --colgap : The additional gap between columns, proportional to the maximum width of the tiles. This is used to adjust the spatial coordinates of the tiles. The default value is 0.0.
  • --match-len : The length of the spatial barcode to be considered for a match. The default is 27. The maximum possible value is 27.
  • --max-dup : The number of duplicates to allow for each spatial barcode, within the maximum distance threshold specified by --max-dup-dist-nm. The default is 1, which means no duplicates are allowed.
  • --max-dup-dist-nm : The maximum distance in nanometers allowed for duplicates. The default is 1000.0 nm.
  • --write-all : Write all spatial barcodes to the output file, including duplicate and filtered reads. By default, only the unique spatial barcodes that pass the filtering threshold are written to the output file.

Expected Output

In the output directory [outdir], the following files will be created.

  • [outdir]/manifest.tsv : The manifest file containing the information of the combined spatial barcode file. The line contains the following information:
    • id : ID of the tile as [lane]_[tile]
    • filepath : The filename of the combined spatial barcode map, which will be 1_1.sbcds.sorted.tsv.gz because there will be only a single lane and tile (1_1) in the output.
    • barcodes : The total number of spatial barcodes found in the tile.
    • matches : The number of barcodes that match the expected format.
    • mismatches : The number of barcodes that do not match the expected format.
    • xmin : The minimum value of the x coordinate in nanometers.
    • xmax : The maximum value of the x coordinate in nanometers.
    • ymin : The minimum value of the y coordinate in nanometers.
    • ymax : The maximum value of the y coordinate in nanometers.
  • [outdir]/1_1.sbcds.sorted.tsv.gz : The combined spatial barcode file in compressed tsv format. Each column in the tsv file contains the following information:
    1. Spatial barcode sequences (in reverse complement, if specified in the format).
    2. Lane (always 1)
    3. Tile (always 1)
    4. x-coordinate in nanometer scale
    5. y-coordinate in nanometer scale
    6. Number of bases mismatching to the expected pattern defined by the format (0 is perfect match).
  • [outdir]/dupstats.tsv.gz : Duplicate barcode statistics. For each duplicate barcode, the number of duplicates and the maximum distance between them (if within the --max-dup-dist-nm threshold) are reported.

Full Usage

The full usage of the software tool is as follows:

$ ./spatula combine-sbcds --help
[./spatula combine-sbcds] -- Combine multiple SBCD files

 Copyright (c) 2022-2024 by Hyun Min Kang
 Licensed under the Apache License v2.0 http://www.apache.org/licenses/

Detailed instructions of parameters are available. Ones with "[]" are in effect:

Available Options:

== Input options ==
   --layout              [STR: ]             : Layout file, each containing [lane] [tile] and [row]/[col] as columns
   --offset              [STR: ]             : Offset file, each containing [lane] [tile] and [row]/[col] as columns
   --sbcd                [STR: ]             : Directory containing spatial barcode files
   --manifest            [STR: ]             : Manifest file containing the list of spatial barcode files
   --require-exact-match [FLG: OFF]          : Require exact match between manifest file and layout file. If false, layout can only contain subset of tiles in the manifest file

== Output Options ==
   --out                 [STR: ]             : Output spatial barcode file after merging
   --write-all           [FLG: OFF]          : Write all spatial barcodes to the output file, including duplicated and filtered reads

== Options for coordinate conversion ==
   --pixel-to-nm         [FLT: 34.78]        : Pixel to nm conversion factor (37.5 for Seq-Scope)
   --rowgap              [FLT: 0.00]         : Additional gap between rows (proportional to the height of a tile)
   --colgap              [FLT: 0.00]         : Additional gap between columns (proportional to the width of a tile)

== Options for duplicate filtering ==
   --match-len           [INT: 27]           : Length of HDMI spatial barcode to be considered for a match
   --max-dup             [INT: 1]            : Maximum number of duplicates allowed for each spatial barcode. If this is 1, duplicates are not allowed
   --max-dup-dist-nm     [FLT: 10000.00]     : Maximum distance allowed for duplicates in nm scale


NOTES:
When --help was included in the argument. The program prints the help message but do not actually run