Skip to content

spatula match-sbcds

Summary

spatula match-sbcds matches a 2nd-seq FASTQ file with the spatial barcodes map and identifies matching barcodes and their spatial distribution.

Here is a summary of the main features:

  • Input: Takes (a) a spatial barcode directory from build-sbcds or combined-sbcds, and (b) a 2nd-seq FASTQ file (read 1 only).
  • Output: Produces (a) a full list of matching barcodes and their spatial coordinates, (b) summary metrics of reads matching to the barcodes, and © per-tile summary metrics of matching barcodes.

An example usage of the tool is as follows:

spatula match-sbcds --fq /path/to/second_seq.R1.fastq.gz \
                    --sbcd /path/to/sbcd/dir/ \
                    --skip-sbcd 1 \
                    --out /path/to/output_prefix
See below for a more detailed usage description.

Required options

  • --fq : The path to 2nd-seq FASTQ file (read 1) that contains the sequences that should match with the spatial barcode dictionary.
  • --sbcds : The path to the spatial barcode dictionary directory, generated by build-sbcds or combined-sbcds. It should contain manifest.tsv file and corresponding barcode map for each tile in the manifest file.
  • --out : The prefix for the output files. See Expected Output for more details.

Additional options

  • --batch : The size of single batch for performing the matching process. Each batch will be processed and the output will be written to temporary files, and merged at the end. The default value is 300M (300000000).
  • --skip-sbcd : The number of bases to be skipped in the beginning of the read. This is useful when insufficient bases are sequenced in the 1st-seq spatial barcode, which is reverse complemented.
  • --skip-duplicates : Skip barcodes that occurs multiple times in the barcode map. By default, one of the duplicate barcodes are arbitrarily selected and written to the output.
  • --match-len : The length of the spatial barcode to consider for a match. The default is 27, and the maximum possible value is 27.

Expected Output

With [out_prefix] as the prefix, the following files will be created:

  • [out_prefix].match.sorted.uniq.tsv.gz : A compressed tsv file containing the following entries in each row:
    1. Spatial barcode (alphebetically sorted)
    2. Lane of spatial coordinate
    3. Tile of spatial coordinate
    4. X-coordinate of spatial coordinate
    5. Y-coordinate of spatial coordinate
    6. Number of bases that do not match the expected pattern defined by the format (0 is a perfect match).
    7. Number of occurrences in the 2nd-seq FASTQ file.
  • [out_prefix].summary.tsv : A tab-delimited file summarizing the overall statistics of matching barcode
    • Type : The type of statistics, including the following values:
      • Total : All reads in the 2nd-seq FASTQ file.
      • Miss : Reads that do not contain matching spatial barcodes.
      • Match : Reads that match with a spatial barcode.
      • Unique : Unique spatial barcodes that has matches.
      • Dup(Exact) : Duplicate barcodes calculated as Match - Unique.
    • Reads : The number of reads or barcodes that match the type.
    • Fraction : The fraction of the reads (among all reads) that match the type.
  • [out_prefix].counts.tsv : A tab-delimited file summarizing the matching barcode statistics per tile. It contains the following entries in each row:
    • id : The ID of the tile, in the format of lane_tile.
    • filepath : The name of the tsv.gz file containing spatial barcode map.
    • barcodes : The total number of spatial barcodes found in the tile.
    • matches : The number of barcodes that match the expected format.
    • unique : The number of unique barcodes that match the expected format.

Full Usage

The full usage of the software tool is as follows:

$ ./spatula match-sbcds --help    
[./spatula match-sbcds] -- Match the FASTQ file containing spatial barcodes with the spatial barcode dictionary

 Copyright (c) 2022-2024 by Hyun Min Kang
 Licensed under the Apache License v2.0 http://www.apache.org/licenses/

Detailed instructions of parameters are available. Ones with "[]" are in effect:

Available Options:

== Input options ==
   --fq              [STR: ]             : FASTQ file read 1 containing 2nd-seq spatial barcode
   --sbcd            [STR: ]             : Spatial barcode dictionary generated from 'build-sbcds' command
   --batch           [INT: 300000000]    : Size of a single batch
   --skip-sbcd       [INT: 0]            : Skip first bases of spatial barcode (Read 1)
   --match-len       [INT: 27]           : Length of HDMI spatial barcodes to require perfect matches
   --skip-duplicates [FLG: OFF]          : Skip duplicate barcodes that occurs multiple times

== Output Options ==
   --out             [STR: ]             : Output prefix (index.tsv, matches.tsv.gz)


NOTES:
When --help was included in the argument. The program prints the help message but do not actually run