spatula build-sbcds¶
Summary¶
spatula build-sbcds
creates a spatial barcode dictionary based from an Illumina FASTQ file. Here is a summary of the main features:
- Input: Takes a single-ended Illumina-sequenced FASTQ file containing the original read names as input.
- Output: Produces a map of spatial barcodes (i.e. a mapping between barcode sequences and spatial coordinates), separated by each tile. It also generates a manifest file that summarizes the output for each tile.
An example usage of the tool is as follows:
spatula build-sbcds --fq1 /path/to/input_file.R1.fastq.gz --format DraI32 \
--out /path/to/output/dir/
See below for a more detailed usage description.
Required options¶
--fq
: The path to the (single-ended) input FASTQ file. The read name must include the spatial coordinate information. (i.e. SRA-processed read names must not be used)--format
: Predefined spatial barcode format. It specifies the expected patterns of spatial barcodes, and whether reverse complement is required or not. A limited number of barcode patterns, including experimental ones, are accepted. The most popular choices are:DraI32
: 32-bp barcode format from the Seq-Scope paper. The expected pattern in IUPAC codes isNNNNNBNNBNNBNNBNNBNNBNNBNNBVNBNN
; reverse complement is performed.HDMI20
: 20-bp barcode format from the original Seq-Scope paper. This is intended for a small array area; reverse complement is performed.DraI31
: 31-bp barcode format, almost identical toDraI32
. This format is useful for one of the example sequence data we provided, which sequenced only the first 31-bp ofDraI32
format. The expected pattern in IUPAC codes isNNNNNBNNBNNBNNBNNBNNBNNBNNBVNBN
; reverse complement is performed.
--out
: Output directory that stores the spatial barcode dictionary per tile. See Expected Output for more details.
Additional Options¶
--platform
: Platform on which the FASTQ file is generated. This specifies the expected format of the read name. The key fields extracted areLANE
,TILE
,X
,Y
coordinates. Currently, the only officially supported format isIllumina
(Other experimental formats are not documented here and will not be supported).Illumina
: Illumina read name should have the following format:[INSTRUMENT_ID]:[RUN_ID]:[FLOWCELL_ID]:[LANE]:[TILE]:[X]:[Y]
.
--force-lane
: (For experimental platforms only) Force the lane number to a certain value; this is useful for certain platforms where the read name does not contain the lane information.
Expected Output¶
In the output directory [outdir]
, the following files will be created.
[outdir]/manifest.tsv
contains the list of output tsv files. Each line contains the following information:id
: ID of the tile as[lane]_[tile]
filepath
: Output file name associated with the tile[lane]_[tile].sbcds.sorted.tsv.gz
. Note that the output is an unsorted tsv file, which must be sorted externally.barcodes
: The total number of spatial barcodes found in the tile.matches
: The number of barcodes matching the expected format.mismatches
: The number of barcodes that do not match the expected format.xmin
: The minimum value of the x coordinate per tilexmax
: The maximum value of the x coordinate per tileymin
: The minimum value of the y coordinate per tile.ymax
: The maximum value of the y coordinate per tile.
- For each tile,
[outdir]/[lane]_[tile].sbcds.unsorted.tsv
contains the barcode sequences and their spatial coordinates in tsv format. Note that the output is an unsorted tsv file, which must be sorted alphabetically and compressed into[lane]_[tile].sbcds.sorted.tsv.gz
afterbuild-sbcds
has finished. Each column in the tsv file contains the following information:- Spatial barcode sequences (in reverse complement if the specified format requires it).
- Lane
- Tile
- x-coordinate
- y-coordinate
- Number of bases that do not match the expected pattern defined by the format (0 is a perfect match).
Full Usage¶
The full usage of spatula build-sbcds
can be viewed with the --help
option:
$ ./spatula build-sbcds --help
[./spatula build-sbcds] -- Create spatial barcode dictionary based from HDMI FASTQ arrays
Copyright (c) 2022-2024 by Hyun Min Kang
Licensed under the Apache License v2.0 http://www.apache.org/licenses/
Detailed instructions of parameters are available. Ones with "[]" are in effect:
Available Options:
== Input options ==
--fq [STR: ] : Input FASTQ file that contains spatial barcode in the readname
--format [STR: ] : Format of the spatial barcodes (e.g. DraI32, HDMI20, T7-30, etc)
--platform [STR: Illumina] : Platform of the sequencing data to determine the rule to parse the readnames (e.g. Illumina)
--force-lane [INT: 0] : Force a lane number. Required a positive value for Salus/SalusGlobal platforms
== Output Options ==
--out [STR: ] : Output directory name
NOTES:
When --help was included in the argument. The program prints the help message but do not actually run