Spatial Gene Expression Format Conversion¶
Overview¶
SGE datasets vary widely in format and resolution across platforms. Since FICTURE requires SGE in a specific format, CartLoader toolkit provides the sge_convert module to standardize raw, platform-specific SGE into a FICTURE-compatible format.
Requirements¶
Input Data Requirements
Ensure the input data (raw, platform‑specific SGE) is transcript‑indexed and contains at least the following fields:
- Spatial coordinates (X, Y)
- Feature metadata (such as gene symbols)
- Expression counts
Platform Compatibility
The current sge_convert supports standardizing SGE from the following platforms:
| Source | --platform Option |
Required Input Files |
|---|---|---|
| 10x Visium HD | 10x_visium_hd |
--in-mex, --pos-parquet, --scale-json |
| Seq-Scope | seqscope |
--in-mex |
| 10x Xenium | 10x_xenium |
--in-csv (common); --in-parquet also supported. |
| Stereo-seq | bgi_stereoseq |
--in-csv |
| CosMx SMI | cosmx_smi |
--in-csv |
| Vizgen MERSCOPE | vizgen_merscope |
--in-csv |
| Pixel-seq | pixel_seq |
--in-csv |
| Nova-ST | nova_st |
--in-csv |
| Generic CSV/TSV input 1 | generic |
--in-csv |
1: For SGE from platforms not yet explicitly supported by CartLoader, or from custom/preprocessed sources, sge_convert provides a generic option that accepts CSV/TSV files with basic required fields (e.g., gene, spatial coordinates, expression count) for standardization and processing.
Pre-installed tools
gzip(or pigz)spatula(required if--sge-visualis set)gdal_translate,gdalwarp(required if--sge-visualis set with--north-up)
Example Usage¶
1) Input SGE in MEX Format¶
1.1) Seq-Scope¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
1.2) 10x Visium HD¶
1 2 3 4 5 6 7 8 9 | |
2) Input SGE in TSV/CSV Format¶
This applies to input SGE in TSV/CSV format from platforms. Below is an example converting SGE from Stere-seq.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Verify Your Input Structure
Always double-check the column names and metadata format of your input files. If they differ from the expected defaults, override them using --csv-* and --min-phred-score options.
Platform-specific Default Settings
To streamline the process, sge_convert automatically applies platform‑dependent defaults for CSV/TSV parsing based on known file formats and column conventions.
Below summarizes the default values for key parameters per supported platform:
| Platform | --csv-comment1 |
--csv-delim |
--csv-colname-x |
--csv-colname-y |
--csv-colnames-count |
--csv-colname-feature-name |
|---|---|---|---|---|---|---|
| 10x Xenium2 | False |
, |
x_location |
y_location |
- | feature_name |
| StereoSeq | False |
\t |
x |
y |
MIDCounts |
geneID |
| CosMx SMI | False |
, |
x_local_px |
y_local_px |
- | target |
| MERSCOPE | False |
, |
global_x |
global_y |
- | gene |
| Pixel-seq | False |
\t |
xcoord |
ycoord |
- | geneName |
1 --csv-comment: If True, the lines starts with # will be treated as comments and will be skipped.
2 10x Xenium: Besides the above default settings, for 10x Xenium data, sge_convert also applies --csv-colname-phredscore qv and --min-phred-score 20.
Actions¶
Action Specifications
SGE conversion runs by default. If needed, activate other options: --filter-by-density and --sge-visual.
SGE Conversion (always runs)¶
Converting SGE into a FICTURE-compatible TSV format. During conversion, SGE coordinates are rescaled to micrometer units based on the pixel resolution specified in the input. It's also available to apply feature (typically, genes) filtering.
Density-based Filtering (--filter-by-density)¶
If --filter-by-density is set, automatically identify and retain high-quality tissue regions based on transcript density and spatial structure. This step takes the format-standardized SGE as input and generate a density-based filtered SGE.
SGE Visualization (--sge-visual)¶
If --sge-visual is set, draws an image of 2D points provided as an input. In this step, it is optional to enable the --north-up option to ensuring correct spatial orientation (i.e., Y-axis increases upward/north and X-axis increases to the right/east).
Parameters¶
Below are the core parameters. See more details in the collapsible sections below.
SGE Conversion¶
--platform(str): Source platform to infer input format and defaults. Options:10x_visium_hd,seqscope,10x_xenium,bgi_stereoseq,cosmx_smi,vizgen_merscope,pixel_seq,nova_st,generic.--in-json(str): Input manifest JSON. If set, can skip--in-mex/--in-parquet/--in-csv/--pos-parquet/--scale-json(platforms: 10x_xenium, 10x_visium_hd).--in-mex(str): Path to input MEX directory (platforms: 10x Visium HD, SeqScope).--in-csv(str): Path to input CSV/TSV (platforms: 10x Xenium, BGI Stereo‑seq, CosMx SMI, Vizgen MERSCOPE, Pixel‑seq, Nova‑ST, generic).--in-parquet(str): Path to input transcript parquet (platform: 10x Xenium).--pos-parquet(str): Path to position parquet with spatial coordinates (platform: 10x Visium HD; typical:tissue_positions.parquet).--scale-json(str): Path to scale JSON; if set, derives--units-per-umfrommicrons_per_pixel(platform: 10x Visium HD; typical:scalefactors_json.json).--units-per-um(float): Coordinate units per µm (default: 1.00). Prefer--scale-jsonfor 10x Visium HD.--out-dir(str): Output directory.--include-feature-regex(regex): Regex of feature/gene names to include.--exclude-feature-regex(regex): Regex of feature/gene names to exclude.
Auxiliary SGE Conversion Paramaters
Recommend to use the default values; override only if needed. See more details by running:
1 | |
Auxiliary Input MEX Parameters:
--icols-mtx(int or comma-spearated list): Comma-separated, 1-based indices of the target genomic features among the count columns in the input matrix file (default: 1)--colnames-count(string or comma-spearated list): Comma-separated output column names for the specified genomic features (default: count). The number of names specified by--colnames-countmust match the number of indices provided in--icols-mtx.
Auxiliary Input CSV/TSV Parameters:
--csv-comment(flag): If enabled, lines starting with#are skipped (default:Falsefor 10x Xenium, Stereo‑seq, CosMx SMI, MERSCOPE, and Pixel‑seq).--csv-delim(str): Delimiter for the input file (default:","for 10x Xenium, CosMx SMI, and MERSCOPE;"\t"for Stereo‑seq, Pixel‑seq).--csv-colname-x(str): Column name for X coordinates (default:x_locationfor 10x Xenium;xfor Stereo‑seq;x_local_pxfor CosMx SMI;global_xfor MERSCOPE;xcoordfor Pixel‑seq).--csv-colname-y(str): Column name for Y coordinates (default:y_locationfor 10x Xenium;yfor Stereo‑seq;y_local_pxfor CosMx SMI;global_yfor MERSCOPE;ycoordfor Pixel‑seq).--csv-colnames-count(str): Comma‑separated column names for expression count. If not provided, defaults to a count of 1 per transcript (default:MIDCountsfor Stereo‑seq).--csv-colname-feature-name(str): Column name for gene name (default:feature_namefor 10x Xenium;geneIDfor Stereo‑seq;targetfor CosMx SMI;genefor MERSCOPE;geneNamefor Pixel‑seq).--csv-colnames-others(str): Column names to keep.--csv-colname-phredscore(str): Column name for Phred‑scaled quality value estimating the probability of incorrect calls (default:qvfor 10x Xenium).--min-phred-score(float): Minimum Q-score to retain a transcript.- Default for
10x_xenium: 20.0 - Default for others: 0.0
- Default for
Auxiliary Output Parameters:
--out-transcript(str): File name for output compressed transcript-indexed SGE file in TSV format (default:transcripts.unsorted.tsv.gz).--out-minmax(str): File name for coordinate min–max values in TSV format (default:coordinate_minmax.tsv).--out-feature(str): File name for compressed UMI count per gene in TSV format (default:feature.clean.tsv.gz).--precision-um(int): Decimal precision for transcript coordinates; set to0to round to integers (default: 2).--colname-x(str): Column name for the X-coordinate in the output SGE (default: X).--colname-y(str): Column name for the Y-coordinate in the output SGE (default: Y).--colname-count(str): Comma‑separated column names for count in the output SGE (default: count).--colname-feature-name(str): Column name for the gene name in the output SGE (default: gene).--out-json(str): Output JSON manifest of SGE paths (default:<out-dir>/sge_assets.json).
Environment Parameters
If the binaries are already available in your system's PATH, you may omit these options.
--gzip(str): Path togzipbinary (default:gzip).--pigz(str): Path topigzbinary for parallel gzip compression (default:pigz).--pigz-threads(int): Number of threads forpigz(default: 4).--spatula(str): Path tospatulabinary (default:spatula)--parquet-tools(str): Path toparquet-toolsbinary (used with --in-parquet or --pos-parquet; default:parquet-tools)
Run Parameters:
--dry-run(flag): Generate the Makefile; do not execute.--restart(flag): Ignore existing outputs and rerun all steps.--makefn(str): Makefile name to write (default:sge_convert.mk).--n-jobs(int): Number of parallel jobs (default: 1).
Density‑based Filtering¶
--filter-by-density(flag): Enable SGE filtering by density.--out-filtered-prefix(str): Prefix for output filtered SGE files (default: filtered).
Auxiliary Density‑based Filtering Parameters
We recommend using default values; override only if needed.
--radius(int): Radius for the polygon area calculation (default: 15).--quartile(int): Quartile for the polygon area calculation (default: 2).--hex-n-move(int): Sliding step (default: 1).--polygon-min-size(int): Minimum polygon size (default: 500).
SGE Visualization¶
--sge-visual(flag): Enable SGE visualization.--north-up(flag): Enable north‑up orientation for the SGE visualization.
Auxiliary SGE Visualization Parameters
We recommend using default values; override only if needed.
--out-xy(str): File name for output SGE visualization image (default:xy.png).--out-northup-tif(str): File name for output north‑up oriented image (default:xy_northup.tif).--srs(str): Spatial reference system (used with--north-up; default: EPSG:3857).--resample(str): Resampling method (used with--north-up; options: near, bilinear, cubic, etc.; default: cubic).--gdal_translate(str): Path togdal_translatebinary (used with--north-up; default:gdal_translate)--gdalwarp(str): Path togdalwarpbinary (used with--north-up; default:gdalwarp)
Output¶
CartLoader generates the following harmonized outputs:
Unified SGE matrix¶
Both SGE conversion and density-based filtering generate a unified SGE matrix, consisting of:
transcripts.unsorted.tsv.gz: transcript-indexed SGE in TSV
1 2 3 4 | |
X: X coordinates in umY: Y coordinates in umgene: gene symbolscount: expression count per pixel per gene
feature.clean.tsv.gz: UMI counts on a per-gene basis in TSV
1 2 3 4 | |
gene: gene symbols
* gene_id: gene IDs
* count: expression count per gene
coordinate_minmax.tsv: X Y min/max coordinates
1 2 3 4 | |
xminxmax: min and max X coordinates in umyminymax: min and max Y coordinates in um
SGE Images¶
- When
--sge-visualis enabled, a monochrome PNG image is generated to visualize the SGE data. - When
--north-upis enabled, a georeferenced TIFF image is produced with a north-up orientation.