spatula join-pixel-tsv¶

Summary¶

spatula join-pixel-tsv combines the raw input data used in FICTURE analysis with the pixel-level output generated by FICTURE into a single TSV file.

The main reason of creating this tool is related to the fact that the pixel-level output file (in TSV format) produced by FICTURE is quite complicated for users. The image output (PNG file) generated by FICTURE is very straightforward to interpret, but the TSV file might not be due to the following reasons:

The pixel-level output is sorted by two spatial coordinates (X and Y) in a complex way to navigate mini-batches efficiently.
The X/Y coordinates in the pixel-level output needs to be transformed to be interpreted in the original spatial coordinate system.
There can be duplicate information near the boundaries between mini-batches.
Some spatial coordinates within certain distance (e.g. <0.1um or <0.01um) are collapsed together for efficient processing based on the parameter settings used.
The pixel-level output do not provide the raw transcript information observed at the spatial position.
Some of the spatial coordinates located in very sparse regions may have been filtered out in the pixel-level output.

For these reasons, there have been requests to provide pixel-level inference from FICTURE by combining the raw input data with the pixel-level output. spatula join-pixel-tsv is intended to provide the information by post-processing the pixel-level output from FICTURE. While the tool relies on some heuristics, we believe that the output should be reasonable as long as FICTURE is run with typical settings.

For each spatial coordinate in the raw input data, spatula join-pixel-tsv identifies the closest point in each pixel-level output (as long as the distance is within a certain threshold) and annotate individual pixels with corresponding factors inferred by FICTURE. Because FICTURE's pixel-level output is spatially smooth, this method should be almost identical to running FICTURE on individual spatial coordinate without collapsing nearby points. This procedure should also remove any duplicate points between mini-batches in FICTURE's pixel-level output.

Usage¶

Sorting the pixel-level output from FICTURE¶

The spatula join-pixel-tsv tool assumes that the pixel-level output is sorted by the major axis, in the consistent way to the input data used in FICTURE analysis. For example, if the input file (typically transcripts.sorted.tsv.gz) is sorted by the X-coordinate, the pixel-level output should be sorted by the X-coordinate as well. Typically, the pixel-level output TSV files generated by FICTURE are NOT sorted by a single coordinate, so you will need to sort the file based on the major axis used to sort the input data transcripts.sorted.tsv.gz using the following command, assuming ${PREFIX}.pixel.sorted.tsv.gz is the pixel-level output from FICTURE

# ${PREFIX}.pixel.sorted.tsv.gz is the pixel-level output from FICTURE
# This example assumes that the major axis is X-axis
# If the major axis is Y-axis, use -gk3 instead of -gk2.
(gzip -cd ${PREFIX}.pixel.sorted.tsv.gz \
    | head | grep ^#; \
    gzip -cd ${PREFIX}.pixel.sorted.tsv.gz \
    | grep -v ^# | sort -S 1G -gk2) \
    | gzip -c > ${PREFIX}.pixel.sorted_by_major_axis.tsv.gz

Running `spatula join-pixel-tsv` with multiple pixel-level outputs from FICTURE¶

The spatula join-pixel-tsv tool can be run with multiple pixel-level outputs from FICTURE. The tool will combine the pixel-level outputs into a single TSV file. As an example, assume that there are two pixel-level outputs from FICTURE, namely nF12.d_12.decode.prj_12.r_4_5.pixel.sorted.tsv.gz and nF24.d_12.decode.prj_12.r_4_5.pixel.sorted.tsv.gz. Using the sorting procedure described above, you may sort the pixel-level outputs by the major axis with the name nF12.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz and nF24.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz.

We assume that we want to combine the following three files together: * transcripts.sorted.tsv.gz : The generic TSV input file used in FICTURE analysis, sorted by the major axis. * nF12.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz : A pixel-level output from FICTURE, sorted by the major axis. We want to use a prefix nF12__ in the output. * nF24.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz : Another pixel-level output from FICTURE, sorted by the major axis. We want to use a prefix nF24__ in the output.

To join these files together, you may run the following command:

## assume that the output file names are transcripts_ficture_joined.tsv.gz
spatula join-pixel-tsv \
    --in-tsv transcripts.sorted.tsv.gz \
    --pix-prefix-tsv nF12__,nF12.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz \
    --pix-prefix-tsv nF24__,nF24.d_12.decode.prj_12.r_4_5.pixel.sorted_by_major_axis.tsv.gz \
    --out-prefix transcripts_ficture_joined \
    --max-dist-um 0.2

Then transcripts_ficture_joined.tsv.gz and auxiliary files will be produced.

Example output may look like this:

X   Y   gene    gn  nF12__K1    nF12__P1    nF24__K1    nF24__P1
187.35  3904.17 Fmod    1   NA  NA  NA  NA
187.37  3822.76 Aqp4    1   4   1   8   1
187.39  3798.29 Gfap    1   NA  NA  NA  NA
187.45  3866.88 Igfbp5  1   NA  NA  NA  NA
187.46  3819.01 Igfbp5  1   NA  NA  NA  NA
187.49  3820.90 Ntsr2   1   NA  NA  NA  NA
187.62  3822.98 Igf2    1   4   1   8   1
187.68  3946.95 Vat1l   1   4   1   8   1
187.77  3822.63 Gng12   1   4   0.614   4   1
187.81  3844.77 Angpt1  1   4   1   8   1
187.81  3845.66 Carmn   1   4   1   8   1
187.86  3823.45 Slc39a12    1   2   0.529   4   0.999
...

Note that you may observe many lines with NA in the beginning of the files because they tend to be located in very sparse region excluded in the FICTURE analysis.

Command line arguments¶

Key options¶

--mol-tsv : Genetic TSV file sorted by the major axis. Typically transcripts.sorted.tsv.gz.
--pix-prefix-tsv : (Can be used multiple times) A string in [prefix],[filename] format that specifies the prefix and the pixel-level output file (sorted by the major axis). The prefix will be used to annotate the columns in the output file.
--out-prefix : Prefix of output files.
--max-dist-um : The maximum theshold (in um) of distance between the spatial coordinates in transcripts and FICTURE's pixel-level decoding output to be considered as a match.
--out-max-k : The maximum number of pixel-level factors to be included in the joined output (default: 1)
--out-max-p : The maximum number of posterior probabilities to be included in the joined output (default: 1)
--sort-axis : The major axis used to sort the input data. Default is X. If the major axis is Y, use Y instead.

Additional options¶

--bin-um : The unit of binning (in um) to search for nearest match. Default value 1um is recommended.
--colname-x : Column name for X-coordinate in the output file (default is X).
--colname-y : Column name for Y-coordinate in the output file (default is Y).
--colnames-include : Comma-separated column names to include in the output TSV file.
--colnames-exclude : Comma-separated column names to exclude in the output TSV file.
--out-suffix-tsv : Suffix to stored the output TSV file
--out-suffix-hist : Suffix to stored the output histogram file of best matching distances.

Expected Output¶

If the output prefix is ${OUTPREFIX}, the following files will be created.

${OUT_PREFIX}.tsv.gz : A generic TSV file that has the same columns with the input files (unless some columns are excluded), and additional columns for each pixel-level output to represent the matching factor and posterior probability.
${OUT_PREFIX}.hist.tsv : A histogram file that shows the distribution of the best matching distances between the spatial coordinates in the input file and the pixel-level output from FICTURE.

Full Usage¶

The full usage of the software tool is as follows:

$ ./spatula join-pixel-tsv --help       
[./spatula join-pixel-tsv] -- Join pixel-level output from FICTURE with raw transcript-level TSV files

 Copyright (c) 2022-2024 by Hyun Min Kang
 Licensed under the Apache License v2.0 http://www.apache.org/licenses/

Detailed instructions of parameters are available. Ones with "[]" are in effect:

Available Options:

== Key Input/Output Options ==
   --mol-tsv            [STR: ]             : TSV file containing individual molecules
   --pix-prefix-tsv     [V_STR: ]           : TSV file containing pixel-level factors
   --out-prefix         [STR: ]             : Output prefix for the joined TSV files

== Key Parameters ==
   --bin-um             [FLT: 1.00]         : Bin size for grouping the pixel-level output for indexing
   --max-dist-um        [FLT: 0.50]         : Maximum distance in um to consider a match

== Expected columns in input and output ==
   --colname-x          [STR: X]            : Column name for X-axis
   --colname-y          [STR: Y]            : Column name for Y-axis
   --sort-axis          [STR: X]            : Column name used in sorting. Both files must be sorted in the same axis (default: X-axis)
   --colnames-include   [STR: ]             : Comma-separated column names to include in the output TSV file
   --colnames-exclude   [STR: ]             : Comma-separated column names to exclude in the output TSV file
   --out-max-k          [INT: 1]            : Maximum number of pixel-level factors to include in the joined output. (Default : 1)
   --out-max-p          [INT: 1]            : Maximum number of pixel-level posterior probabilities to include in the joined output. (Default : 1)

== Output File suffixes ==
   --out-suffix-tsv     [STR: .tsv.gz]      : Suffix for the output TSV file
   --out-suffix-hist    [STR: .dist.hist.tsv] : Suffix for the histogram of match distance
   --out-suffix-summary [STR: .summary.tsv] : Suffix for the summary file


NOTES:
When --help was included in the argument. The program prints the help message but do not actually run