spatula match-sbcds¶
Summary¶
spatula match-sbcds
matches a 2nd-seq FASTQ file with the spatial barcodes map and
identifies matching barcodes and their spatial distribution.
Here is a summary of the main features:
- Input: Takes (a) a spatial barcode directory from
build-sbcds
orcombined-sbcds
, and (b) a 2nd-seq FASTQ file (read 1 only). - Output: Produces (a) a full list of matching barcodes and their spatial coordinates, (b) summary metrics of reads matching to the barcodes, and © per-tile summary metrics of matching barcodes.
An example usage of the tool is as follows:
spatula match-sbcds --fq /path/to/second_seq.R1.fastq.gz \
--sbcd /path/to/sbcd/dir/ \
--skip-sbcd 1 \
--out /path/to/output_prefix
Required options¶
--fq
: The path to 2nd-seq FASTQ file (read 1) that contains the sequences that should match with the spatial barcode dictionary.--sbcds
: The path to the spatial barcode dictionary directory, generated bybuild-sbcds
orcombined-sbcds
. It should containmanifest.tsv
file and corresponding barcode map for each tile in the manifest file.--out
: The prefix for the output files. See Expected Output for more details.
Additional options¶
--batch
: The size of single batch for performing the matching process. Each batch will be processed and the output will be written to temporary files, and merged at the end. The default value is 300M (300000000).--skip-sbcd
: The number of bases to be skipped in the beginning of the read. This is useful when insufficient bases are sequenced in the 1st-seq spatial barcode, which is reverse complemented.--skip-duplicates
: Skip barcodes that occurs multiple times in the barcode map. By default, one of the duplicate barcodes are arbitrarily selected and written to the output.--match-len
: The length of the spatial barcode to consider for a match. The default is 27, and the maximum possible value is 27.
Expected Output¶
With [out_prefix]
as the prefix, the following files will be created:
[out_prefix].match.sorted.uniq.tsv.gz
: A compressed tsv file containing the following entries in each row:- Spatial barcode (alphebetically sorted)
- Lane of spatial coordinate
- Tile of spatial coordinate
- X-coordinate of spatial coordinate
- Y-coordinate of spatial coordinate
- Number of bases that do not match the expected pattern defined by the format (0 is a perfect match).
- Number of occurrences in the 2nd-seq FASTQ file.
[out_prefix].summary.tsv
: A tab-delimited file summarizing the overall statistics of matching barcodeType
: The type of statistics, including the following values:Total
: All reads in the 2nd-seq FASTQ file.Miss
: Reads that do not contain matching spatial barcodes.Match
: Reads that match with a spatial barcode.Unique
: Unique spatial barcodes that has matches.Dup(Exact)
: Duplicate barcodes calculated asMatch
-Unique
.
Reads
: The number of reads or barcodes that match the type.Fraction
: The fraction of the reads (among all reads) that match the type.
[out_prefix].counts.tsv
: A tab-delimited file summarizing the matching barcode statistics per tile. It contains the following entries in each row:id
: The ID of the tile, in the format oflane_tile
.filepath
: The name of thetsv.gz
file containing spatial barcode map.barcodes
: The total number of spatial barcodes found in the tile.matches
: The number of barcodes that match the expected format.unique
: The number of unique barcodes that match the expected format.
Full Usage¶
The full usage of the software tool is as follows:
$ ./spatula match-sbcds --help
[./spatula match-sbcds] -- Match the FASTQ file containing spatial barcodes with the spatial barcode dictionary
Copyright (c) 2022-2024 by Hyun Min Kang
Licensed under the Apache License v2.0 http://www.apache.org/licenses/
Detailed instructions of parameters are available. Ones with "[]" are in effect:
Available Options:
== Input options ==
--fq [STR: ] : FASTQ file read 1 containing 2nd-seq spatial barcode
--sbcd [STR: ] : Spatial barcode dictionary generated from 'build-sbcds' command
--batch [INT: 300000000] : Size of a single batch
--skip-sbcd [INT: 0] : Skip first bases of spatial barcode (Read 1)
--match-len [INT: 27] : Length of HDMI spatial barcodes to require perfect matches
--skip-duplicates [FLG: OFF] : Skip duplicate barcodes that occurs multiple times
== Output Options ==
--out [STR: ] : Output prefix (index.tsv, matches.tsv.gz)
NOTES:
When --help was included in the argument. The program prints the help message but do not actually run