Spatial Gene Expression (SGE) Format Conversion¶
Overview¶
SGE datasets vary widely in format and resolution across platforms. Since FICTURE
requires SGE in a specific format, cartloader
toolkit provides this sge_convert
module to standardize the raw, platform-specific SGE into a FICTURE-compatible format.
Requirements¶
Input Data Requirement:
Please make sure the input data (raw, platform-specific SGE) contains at least the following information. should be a transcript-indexed SGE containing at least:
- Spatial coordinates (X coordinates, Y coordinates)
- Feature metadata (such as gene symbols)
- Expression Counts
Platform Compatibility
The current sge_convert
supports standarizing SGE from the following platforms:
Source | --platform Option |
Required Input Files |
---|---|---|
10x Visium HD | 10x_visium_hd |
--in-mex , --in-parquet , --scale-json |
Seq-Scope | seqscope |
--in-mex |
10x Xenium | 10x_xenium |
--in-csv |
Stereo-seq | bgi_stereoseq |
--in-csv |
CosMx SMI | cosmx_smi |
--in-csv |
Vizgen MERSCOPE | vizgen_merscope |
--in-csv |
Pixel-seq | pixel_seq |
--in-csv |
Generic CSV/TSV input 1 | generic |
--in-csv |
1: For SGE from platforms not yet explicitly supported by cartloader
, or from custom/preprocessed sources, sge_convert
provides a generic
option that accepts CSV/TSV files with basic required fields (e.g., gene, spatial coordinates, expression count) for standardization and processing.
Example Usages¶
Input SGE in MEX Format¶
Seq-Scope
¶
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
10X Visium HD
¶
1 2 3 4 5 6 7 8 9 |
|
Input SGE in TSV/CSV Format¶
This applies to input SGE in TSV/CSV format from platforms including 10X Xenium, StereoSeq, Cosmx SMI, MERSCOPE, Pixel-seq. To simplify preprocessing, sge_convert
automatically applies platform-specific defaults for common CSV/TSV parameters.
Below is an example converting SGE from StereoSeq.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Verify your input structure
Always double-check the column names and metadata format of your input files. If they differ from the expected defaults, override them using --csv-*
and --min-phred-score
options.
Click to view platform-specific default settings
To streamline the process, sge_convert
automatically applies platform-dependent defaults for CSV/TSV parsing based on known file formats and column conventions.
Below summarizes the default values for key parameters per supported platform:
Platform | --csv-comment 1 |
--csv-delim |
--csv-colname-x |
--csv-colname-y |
--csv-colnames-count |
--csv-colname-feature-name |
---|---|---|---|---|---|---|
10X Xenium2 | False |
, |
x_location |
y_location |
- | feature_name |
StereoSeq | False |
\t |
x |
y |
MIDCounts |
geneID |
CosMx SMI | False |
, |
x_local_px |
y_local_px |
- | target |
MERSCOPE | False |
, |
global_x |
global_y |
- | gene |
Pixel-seq | False |
\t |
xcoord |
ycoord |
- | geneName |
1 --csv-comment
: If True
, the lines starts with #
will be treated as comments and will be skipped.
2 10X Xenium: Besides the above default settings, for 10X Xenium data, sge_convert
also applies --csv-colname-phredscore qv
and --min-phred-score 20
.
Actions¶
SGE Conversion¶
Converting SGE into a FICTURE-compatible TSV format. During conversion, SGE coordinates are rescaled to micrometer units based on the pixel resolution specified in the input. It's also available to apply feature (typically, genes) filtering.
(Optional) Density-based Filtering¶
Automatically identify and retain high-quality tissue regions based on transcript density and spatial structure. This step takes the format-standardized SGE as input and generate a density-based filtered SGE.
(Optional) SGE Visualization¶
Draws an image of 2D points provided as an input. In this step, it is optional to enable the --north-up
option to ensuring correct spatial orientation (i.e., Y-axis increases upward/north and X-axis increases to the right/east).
Parameters¶
The following outlines the minimum required parameters.
For most auxiliary parameters, the default values are recommended and could be modified when they do not suit your use case. See more details in the collapsible sections below or by running:
1 |
|
SGE Conversion¶
--platform
(str): Source platform to infer the input file format and default setting (options: "10x_visium_hd
", "seqscope
", "10x_xenium
", "bgi_stereoseq
", "cosmx_smi
", "vizgen_merscope
", "pixel_seq
", "generic
").--in-mex
(str): Path to the input SGE directory in MEX format. Required for MEX-formatted data (e.g., 10X Visium HD, SeqScope).--in-csv
(str): Path to the input SGE file in CSV or TSV format. Required for CSV/TSV-formatted data (e.g., 10X Xenium, StereoSeq, CosMX SMI, MERSCOPE, Pixel-seq).--in-parquet
(str) : Path to the input parquet file with spatial coordinates, if available. Typically namedtissue_positions.parquet
(10X Visium HD).--scale-json
(str): Path to the scale JSON file to compute--units-per-um
, if available. Typically namedscalefactors_json.json
(10X Visium HD).--units-per-um
(float): Coordinate units per micrometer (default: 1.00). Skip if--scale-json
is provided.--out-dir
(str) : Path to the output directory.--include-feature-regex
(regex): (Optional) A regex pattern of feature/gene names to be included.--exclude-feature-regex
(regex): (Optional) A regex pattern of feature/gene names to be excluded.
Auxiliary SGE Conversion Paramaters
Auxiliary Input MEX Parameters:
--icols-mtx
(int or comma-spearated list): Comma-separated, 1-based indices of the target genomic features among the count columns in the input matrix file. (Default: 1)--colnames-count
(string or comma-spearated list): Comma-separated output column names for the specified genomic features. (Default: count). The number of names specified by--colnames-count
must match the number of indices provided in--icols-mtx
.
Auxiliary Input CSV/TSV Parameters:
--csv-comment
(flag): If enabled, lines starts with#
will be skipped (default:False
for 10X Xenium, StereoSeq, CosMx SMI, MERSCOPE, and Pixel-seq).--csv-delim
(str): Delimiter for the input file (default:","
for 10X Xenium, CosMx SMI, and MERSCOPE;"\t"
for StereoSeq, Pixel-seq).--csv-colname-x
(str): Column name for X coordinates (default:x_location
for 10X Xenium;x
for StereoSeq;x_local_px
for CosMx SMI;global_x
for MERSCOPE;xcoord
for Pixel-seq).--csv-colname-y
(str): Column name for Y coordinates (default:y_location
for 10X Xenium;y
for StereoSeq;y_local_px
for CosMx SMI;global_y
for MERSCOPE;ycoord
for Pixel-seq).--csv-colnames-count
(str): Comma-separated column names for expression count. If not provided, a count of 1 per transcript (default:MIDCounts
for StereoSeq).--csv-colname-feature-name
(str): Column name for gene name (default:feature_name
for 10X Xenium;geneID
for StereoSeq;target
for CosMx SMI;gene
for MERSCOPE;geneName
for Pixel-seq).--csv-colnames-others
(str): Columns names to keep.--csv-colname-phredscore
(str): Column name for Phred-scaled quality value estimating the probability of incorrect calls (default:qv
for 10X Xenium).--min-phred-score
(int): Phred-scaled quality score cutoff (default:20
for 10X Xenium).
Auxiliary Output Parameters:
--out-transcript
(str): File name for output compressed transcript-indexed SGE file in TSV format (default:transcripts.unsorted.tsv.gz
).--out-minmax
(str): File name for coordinate min-max values in TSV format (default:coordinate_minmax.tsv
).--out-feature
(str): File name for compressed UMI count per gene in TSV format (default:feature.clean.tsv.gz
).--precision-um
(int): Decimal precision for transcript coordinates; set to0
to round to integers (default: 2).--colname-x
(str): Column name for the X-coordinate in the output SGE (default: X).--colname-y
(str): Column name for the Y-coordinate in the output SGE (default: Y).--colnames-count
(str): Comma-separated column names for expression count in the output SGE (default: count).--colname-feature-name
(str): Column name for the gene name in the output SGE(default: gene).
Auxiliary Environment Parameters
If the binaries are already available in your system's PATH
, you may omit these options.
--gzip
(str): Path togzip
binary; considerpigz -p 4
for faster processing. (Default:gzip
)--spatula
(str): Path tospatula
binary. (Default:spatula
)--parquet-tools
(str): Required if--in-parquet
is used; path toparquet-tools
binary. (Default:parquet-tools
)
(Optional) Density-based Filtering¶
--filter-by-density
(flag): Enable filtering of SGE by density.--out-filtered-prefix
(str): Prefix for output filtered SGE files (default: filtered).--genomic-feature
(str): Genomic feature to be used for density-based filtering. Defaults to the value of--colnames-count
if only one column name is provided.
Auxiliary Density-based Filtering Paramaters
--mu-scale
(float): Scale factor for the polygon area calculation (default: 1.0).--radius
(int): Radius for the polygon area calculation (default: 15).--quartile
(int): Quartile for the polygon area calculation (default: 2).--hex-n-move
(int): Sliding step (default: 1).--polygon-min-size
(int): The minimum polygon size (default: 500).
(Optional) SGE Visualization¶
sge-visual
(flag): Enable SGE visualization.--north-up
(flag): Enable the north-up orientation for the SGE visualization.
Auxiliary SGE Visualization Paramaters
--out-xy
(str): File name for output SGE visualization image (default:xy.png
).--out-northup-tif
(str): File name for output north-up orientated image (default:xy_northup.tif
).--srs
(str): If--north-up
, define the spatial reference system (default: EPSG:3857).--resample
(str): If--north-up
, Define the resampling method (default: cubic). Options: near, bilinear, cubic, etc.--gdal_translate
(str): Required if--north-up
; path togdal_translate
binary. (Default:gdal_translate
)--gdalwarp
(str): Required if--north-up
; path togdalwarp
binary. (Default:gdalwarp
)
Output¶
cartloader
generates the following harmonized outputs:
Unified SGE matrix¶
Both SGE conversion and density-based filtering generate a unified SGE matrix, consisting of:
transcripts.unsorted.tsv.gz
: transcript-indexed SGE in TSV
1 2 3 4 |
|
X
: X coordinates in umY
: Y coordinates in umgene
: gene symbolscount
: expression count per pixel per gene
feature.clean.tsv.gz
: UMI counts on a per-gene basis in TSV
1 2 3 4 |
|
gene
: gene symbols
* gene_id
: gene IDs
* count
: expression count per gene
coordinate_minmax.tsv
: X Y min/max includemd_vigenettes_sgeformat.mdcoordinates
1 2 3 4 |
|
xmin
xmax
: min and max X coordinates in umymin
ymax
: min and max Y coordinates in um
SGE Images¶
- When
--sge-visual
is enabled, a monochrome PNG image is generated to visualize the SGE data. - When
--north-up
is enabled, a georeferenced TIFF image is produced with a north-up orientation.