Spatial Gene Expression Format Conversion¶
Overview¶
SGE datasets vary widely in format and resolution across platforms. Since FICTURE requires SGE in a specific format, cartloader toolkit provides this sge_convert module to standardize the raw, platform-specific SGE into a FICTURE-compatible format.
Requirements¶
Input Data Requirement:
Please make sure the input data (raw, platform-specific SGE) contains at least the following information. should be a transcript-indexed SGE containing at least:
- Spatial coordinates (X coordinates, Y coordinates)
- Feature metadata (such as gene symbols)
- Expression Counts
Platform Compatibility
The current sge_convert supports standarizing SGE from the following platforms:
| Source | --platform Option |
Required Input Files |
|---|---|---|
| 10x Visium HD | 10x_visium_hd |
--in-mex, --in-parquet, --scale-json |
| Seq-Scope | seqscope |
--in-mex |
| 10x Xenium | 10x_xenium |
--in-csv |
| Stereo-seq | bgi_stereoseq |
--in-csv |
| CosMx SMI | cosmx_smi |
--in-csv |
| Vizgen MERSCOPE | vizgen_merscope |
--in-csv |
| Pixel-seq | pixel_seq |
--in-csv |
| Generic CSV/TSV input 1 | generic |
--in-csv |
1: For SGE from platforms not yet explicitly supported by cartloader, or from custom/preprocessed sources, sge_convert provides a generic option that accepts CSV/TSV files with basic required fields (e.g., gene, spatial coordinates, expression count) for standardization and processing.
Example Usages¶
Input SGE in MEX Format¶
Seq-Scope¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
10X Visium HD¶
1 2 3 4 5 6 7 8 9 | |
Input SGE in TSV/CSV Format¶
This applies to input SGE in TSV/CSV format from platforms including 10X Xenium, StereoSeq, Cosmx SMI, MERSCOPE, Pixel-seq. To simplify preprocessing, sge_convert automatically applies platform-specific defaults for common CSV/TSV parameters.
Below is an example converting SGE from StereoSeq.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Verify your input structure
Always double-check the column names and metadata format of your input files. If they differ from the expected defaults, override them using --csv-* and --min-phred-score options.
Click to view platform-specific default settings
To streamline the process, sge_convert automatically applies platform-dependent defaults for CSV/TSV parsing based on known file formats and column conventions.
Below summarizes the default values for key parameters per supported platform:
| Platform | --csv-comment1 |
--csv-delim |
--csv-colname-x |
--csv-colname-y |
--csv-colnames-count |
--csv-colname-feature-name |
|---|---|---|---|---|---|---|
| 10X Xenium2 | False |
, |
x_location |
y_location |
- | feature_name |
| StereoSeq | False |
\t |
x |
y |
MIDCounts |
geneID |
| CosMx SMI | False |
, |
x_local_px |
y_local_px |
- | target |
| MERSCOPE | False |
, |
global_x |
global_y |
- | gene |
| Pixel-seq | False |
\t |
xcoord |
ycoord |
- | geneName |
1 --csv-comment: If True, the lines starts with # will be treated as comments and will be skipped.
2 10X Xenium: Besides the above default settings, for 10X Xenium data, sge_convert also applies --csv-colname-phredscore qv and --min-phred-score 20.
Actions¶
SGE Conversion¶
Converting SGE into a FICTURE-compatible TSV format. During conversion, SGE coordinates are rescaled to micrometer units based on the pixel resolution specified in the input. It's also available to apply feature (typically, genes) filtering.
(Optional) Density-based Filtering¶
Automatically identify and retain high-quality tissue regions based on transcript density and spatial structure. This step takes the format-standardized SGE as input and generate a density-based filtered SGE.
(Optional) SGE Visualization¶
Draws an image of 2D points provided as an input. In this step, it is optional to enable the --north-up option to ensuring correct spatial orientation (i.e., Y-axis increases upward/north and X-axis increases to the right/east).
Parameters¶
The following outlines the minimum required parameters.
For most auxiliary parameters, the default values are recommended and could be modified when they do not suit your use case. See more details in the collapsible sections below or by running:
1 | |
SGE Conversion¶
--platform(str): Source platform to infer the input file format and default setting (options: "10x_visium_hd", "seqscope", "10x_xenium", "bgi_stereoseq", "cosmx_smi", "vizgen_merscope", "pixel_seq", "generic").--in-mex(str): Path to the input SGE directory in MEX format. Required for MEX-formatted data (e.g., 10X Visium HD, SeqScope).--in-csv(str): Path to the input SGE file in CSV or TSV format. Required for CSV/TSV-formatted data (e.g., 10X Xenium, StereoSeq, CosMX SMI, MERSCOPE, Pixel-seq).--in-parquet(str) : Path to the input parquet file with spatial coordinates, if available. Typically namedtissue_positions.parquet(10X Visium HD).--scale-json(str): Path to the scale JSON file to compute--units-per-um, if available. Typically namedscalefactors_json.json(10X Visium HD).--units-per-um(float): Coordinate units per micrometer (default: 1.00). Skip if--scale-jsonis provided.--out-dir(str) : Path to the output directory.--include-feature-regex(regex): (Optional) A regex pattern of feature/gene names to be included.--exclude-feature-regex(regex): (Optional) A regex pattern of feature/gene names to be excluded.
Auxiliary SGE Conversion Paramaters
Auxiliary Input MEX Parameters:
--icols-mtx(int or comma-spearated list): Comma-separated, 1-based indices of the target genomic features among the count columns in the input matrix file. (Default: 1)--colnames-count(string or comma-spearated list): Comma-separated output column names for the specified genomic features. (Default: count). The number of names specified by--colnames-countmust match the number of indices provided in--icols-mtx.
Auxiliary Input CSV/TSV Parameters:
--csv-comment(flag): If enabled, lines starts with#will be skipped (default:Falsefor 10X Xenium, StereoSeq, CosMx SMI, MERSCOPE, and Pixel-seq).--csv-delim(str): Delimiter for the input file (default:","for 10X Xenium, CosMx SMI, and MERSCOPE;"\t"for StereoSeq, Pixel-seq).--csv-colname-x(str): Column name for X coordinates (default:x_locationfor 10X Xenium;xfor StereoSeq;x_local_pxfor CosMx SMI;global_xfor MERSCOPE;xcoordfor Pixel-seq).--csv-colname-y(str): Column name for Y coordinates (default:y_locationfor 10X Xenium;yfor StereoSeq;y_local_pxfor CosMx SMI;global_yfor MERSCOPE;ycoordfor Pixel-seq).--csv-colnames-count(str): Comma-separated column names for expression count. If not provided, a count of 1 per transcript (default:MIDCountsfor StereoSeq).--csv-colname-feature-name(str): Column name for gene name (default:feature_namefor 10X Xenium;geneIDfor StereoSeq;targetfor CosMx SMI;genefor MERSCOPE;geneNamefor Pixel-seq).--csv-colnames-others(str): Columns names to keep.--csv-colname-phredscore(str): Column name for Phred-scaled quality value estimating the probability of incorrect calls (default:qvfor 10X Xenium).--min-phred-score(int): Phred-scaled quality score cutoff (default:20for 10X Xenium).
Auxiliary Output Parameters:
--out-transcript(str): File name for output compressed transcript-indexed SGE file in TSV format (default:transcripts.unsorted.tsv.gz).--out-minmax(str): File name for coordinate min-max values in TSV format (default:coordinate_minmax.tsv).--out-feature(str): File name for compressed UMI count per gene in TSV format (default:feature.clean.tsv.gz).--precision-um(int): Decimal precision for transcript coordinates; set to0to round to integers (default: 2).--colname-x(str): Column name for the X-coordinate in the output SGE (default: X).--colname-y(str): Column name for the Y-coordinate in the output SGE (default: Y).--colnames-count(str): Comma-separated column names for expression count in the output SGE (default: count).--colname-feature-name(str): Column name for the gene name in the output SGE(default: gene).
Auxiliary Environment Parameters
If the binaries are already available in your system's PATH, you may omit these options.
--gzip(str): Path togzipbinary; considerpigz -p 4for faster processing. (Default:gzip)--spatula(str): Path tospatulabinary. (Default:spatula)--parquet-tools(str): Required if--in-parquetis used; path toparquet-toolsbinary. (Default:parquet-tools)
(Optional) Density-based Filtering¶
--filter-by-density(flag): Enable filtering of SGE by density.--out-filtered-prefix(str): Prefix for output filtered SGE files (default: filtered).--genomic-feature(str): Genomic feature to be used for density-based filtering. Defaults to the value of--colnames-countif only one column name is provided.
Auxiliary Density-based Filtering Paramaters
--mu-scale(float): Scale factor for the polygon area calculation (default: 1.0).--radius(int): Radius for the polygon area calculation (default: 15).--quartile(int): Quartile for the polygon area calculation (default: 2).--hex-n-move(int): Sliding step (default: 1).--polygon-min-size(int): The minimum polygon size (default: 500).
(Optional) SGE Visualization¶
sge-visual(flag): Enable SGE visualization.--north-up(flag): Enable the north-up orientation for the SGE visualization.
Auxiliary SGE Visualization Paramaters
--out-xy(str): File name for output SGE visualization image (default:xy.png).--out-northup-tif(str): File name for output north-up orientated image (default:xy_northup.tif).--srs(str): If--north-up, define the spatial reference system (default: EPSG:3857).--resample(str): If--north-up, Define the resampling method (default: cubic). Options: near, bilinear, cubic, etc.--gdal_translate(str): Required if--north-up; path togdal_translatebinary. (Default:gdal_translate)--gdalwarp(str): Required if--north-up; path togdalwarpbinary. (Default:gdalwarp)
Output¶
cartloader generates the following harmonized outputs:
Unified SGE matrix¶
Both SGE conversion and density-based filtering generate a unified SGE matrix, consisting of:
transcripts.unsorted.tsv.gz: transcript-indexed SGE in TSV
1 2 3 4 | |
X: X coordinates in umY: Y coordinates in umgene: gene symbolscount: expression count per pixel per gene
feature.clean.tsv.gz: UMI counts on a per-gene basis in TSV
1 2 3 4 | |
gene: gene symbols
* gene_id: gene IDs
* count: expression count per gene
coordinate_minmax.tsv: X Y min/max coordinates
1 2 3 4 | |
xminxmax: min and max X coordinates in umyminymax: min and max Y coordinates in um
SGE Images¶
- When
--sge-visualis enabled, a monochrome PNG image is generated to visualize the SGE data. - When
--north-upis enabled, a georeferenced TIFF image is produced with a north-up orientation.