CosMX SMI Starter Tutorial¶

This tutorial walks through a starter end-to-end workflow for CosMX SMI data using an adult mouse hippocampus subset extracted from a coronal brain section.

It includes steps of input preparation, SGE format conversion, FICTURE analysis, asset packaging, and data upload.

Set Up the Environment¶

# ====
# Replace each placeholder with the actual path on your system.  
# ====

work_dir=/path/to/work/directory        # path to work directory that contains the downloaded input data
cd $work_dir

# Define paths to required binaries and resources
spatula=/path/to/spatula/binary         # path to spatula executable
punkst=/path/to/punkst/binary           # path to FICTURE2 (punkst) executable
tippecanoe=/path/to/tippecanoe/binary   # path to tippecanoe executable
pmtiles=/path/to/pmtiles/binary         # path to pmtiles executable
aws=/path/to/aws/cli/binary             # path to AWS CLI binary

# (Optional) Define path to color map. 
cmap=/path/to/color/map                 # Path to fixed color map. `CartLoader` includes one at cartloader/assets/fixed_color_map_256.tsv.

# Number of jobs
n_jobs=10                               # If not specified, the number of jobs defaults to 1.

# Activate the bioconda environment
conda activate ENV_NAME                 # replace ENV_NAME with your conda environment name

Prepare Input¶

Data Access¶

The example input data is hosted on Zenodo. Follow the commands below to download it.

cd $work_dir
wget  https://zenodo.org/records/17953582/files/cosmxsmi_starter.raw.tar.gz
tar --strip-components=1 -zxvf cosmxsmi_starter.raw.tar.gz

File Format¶

NanoString CosMx SMI produces single‑molecule spatial transcriptomics data as a comma‑separated values (CSV) table.

CSV File Format

"fov","cell_ID","x_global_px","y_global_px","x_local_px","y_local_px","z","target","CellComp"
64,0,-473043,7954.533,4015.3,4246.2,1,"Gfap","None"
64,0,-473022.9,7902.723,4035.48,4194.39,1,"Fth1","None"
64,0,-473132,7836.476,3926.34,4128.143,1,"Ptn","None"

fov: The field of view (FOV) number.
cell_ID: Unique identifier for a single cell within a given FOV; 0 if background or unassigned molecules.
x_global_px, y_global_px: Global pixel coordinates relative to the tissue.
x_local_px, y_local_px: The x or y position (in pixels) relative to the given FOV.
z: Z-plane index representing the depth (optical section) where the transcript was detected.
target: Target name.
CellComp: Subcellular location of the target.

Define ID and Parameters¶

# Unique identifier for your dataset
DATA_ID="cosmxsmi_hippo"                # change this to reflect your dataset name
PLATFORM="cosmx_smi"                    # platform information
SCALE=$(echo 1000/120|bc -l)            # scale from coordinate to micrometer

# LDA parameters
train_width=12                           # define LDA training hexagon width (comma-separated if multiple widths are applied)
n_factor=6,12                            # define number of factors in LDA training (comma-separated if multiple n-factor values are provided)

How to define Scaling Factors for CosMX SMI?

According to the README.html provided with the example CosMX dataset, each pixel has an edge length of 120 nm. To calculate the number of pixels per micrometer, use the formula: scale = 1000 / 120.

SGE Format Conversion¶

Convert the raw input to the unified SGE format. See more details in its Reference page.

cartloader sge_convert \
  --makefn sge_convert.mk \
  --platform ${PLATFORM} \
  --in-csv ./input.tsv.gz \
  --units-per-um ${SCALE} \
  --out-dir ./sge \
  --exclude-feature-regex '^(BLANK|Neg|Intergenic|Deprecated|Unassigned)' \
  --sge-visual \
  --spatula ${spatula} \
  --n-jobs ${n_jobs}

Parameter	Required	Type	Description
`--platform`	required	string	Platform (options: "`10x_visium_hd`", "`seqscope`", "`10x_xenium`", "`bgi_stereoseq`", "`cosmx_smi`", "`vizgen_merscope`", "`pixel_seq`", "`generic`")
`--in-csv`	required	string	Path to the input TSV/CSV file
`--units-per-um`	required	float	Scale to convert coordinates to microns (default: `1.0`)
`--out-dir`	required	string	Output directory for the converted SGE files
`--makefn`		string	File name for the generated Makefile (default: `sge_convert.mk`)
`--exclude-feature-regex`		regex	Pattern to exclude control features
`--sge-visual`		flag	Enable SGE visualization step (generates diagnostic image) (default: `FALSE`)
`--spatula`		string	Path to the spatula binary (default: `spatula`)
`--n-jobs`		int	Number of parallel jobs for processing (default: `1`)

`FICTURE` Analysis¶

Compute spatial factors using punkst (FICTURE2). See more details on the Reference page.

cartloader run_ficture2 \
  --makefn run_ficture2.mk \
  --main \
  --in-transcript ./sge/transcripts.unsorted.tsv.gz \
  --in-feature ./sge/feature.clean.tsv.gz \
  --in-minmax ./sge/coordinate_minmax.tsv \
  --cmap-file ${cmap} \
  --exclude-feature-regex '^(mt-.*$|Gm\d+$)' \
  --out-dir ./ficture2 \
  --width ${train_width} \
  --n-factor ${n_factor} \
  --spatula ${spatula} \
  --ficture2 ${punkst} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

Parameter	Required	Type	Description
`--main`	required ¹	flag	Enable `CartLoader` to run all five steps
`--in-transcript`	required	string	Path to input transcript-level SGE file
`--out-dir`	required	string	Path to output directory
`--width`	required	int or comma-separated list	LDA training hexagon width(s)
`--n-factor`	required	int or comma-separated list	Number of LDA factors
`--makefn`		string	File name for the generated Makefile (default: `run_ficture2.mk` )
`--in-feature`		string	Path to input feature file
`--in-minmax`		string	Path to input coordinate min/max file
`--cmap-file`		string	Path to color map file
`--exclude-feature-regex`		regex	Pattern to exclude features
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--ficture2`		string	Path to the `punkst` directory (defaults to `punkst` repository within `submodules` directory of `CartLoader`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `1`)

_{¹: CartLoader requires the user to specify at least one action. Available actions include: --tile to run tiling step; --segment to run segmentation step; --init-lda to run LDA training step; --decode to run decoding step; --summary to run summarization step; --main to run all above five actions.}

`CartLoader` Asset Packaging¶

Generate pmtiles and web-compatible tile directories. See more details in Reference page.

run_cartload2 with FICTURE outputrun_cartload2 with sge only

# Example A: With FICTURE outputs (integrates factors + joins)
cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --fic-dir ./ficture2 \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

# Example B: SGE-only (package molecules without FICTURE)
cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --sge-dir ./sge_convert \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

Parameter	Required	Type	Description
`--out-dir`	required	string	Path to the output directory for PMTiles and web tiles
`--id`	required	string	Dataset ID used for naming outputs and metadata
`--fic-dir`		string	Path to FICTURE outputs (enables factor layers + molecule–factor joins)
`--sge-dir`		string	Path to SGE outputs from `sge_convert` (enables SGE-only packaging)
`--in-sge-assets`		string	File name of SGE assets JSON/YAML in `--sge-dir` (default: `sge_assets.json`)
`--in-fic-params`		string	File name of FICTURE params JSON/YAML in `--fic-dir` (default: `ficture.params.json`)
`--makefn`		string	File name for the generated Makefile (default: `run_cartload2.mk`)
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--pmtiles`		string	Path to the `pmtiles` binary (default: `pmtiles`)
`--tippecanoe`		string	Path to the `tippecanoe` binary (default: `tippecanoe`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `4`)

Upload to Data Repository¶

Choose a data repository to host/share your output

CartLoader supports two upload options (AWS and Zenodo) for storing PMTiles of SGE and spatial factors in a data repository.

Choose the one that best suits your needs.

AWS UploadsZenodo Uploads

Upload the generated CartLoader outputs to your designated AWS S3 directory:

# AWS S3 target location
S3_DIR=/s3/path/to/s3/dir              # Recommend to use DATA_ID as directory name, such as s3://bucket_name/test-data

cartloader upload_aws \
  --in-dir ./cartload2 \
  --s3-dir "${S3_DIR}" \
  --aws ${aws} \
  --n-jobs ${n_jobs}

Parameter	Required	Type	Description
`--in-dir`	required	string	Path to the input directory containing the `CartLoader` asset packaging output
`--s3-dir`	required	string	Path to the target S3 directory for uploading
`--aws`		string	Path to the AWS CLI binary
`--n-jobs`		int	Number of parallel jobs

Upload the generated CartLoader outputs to your designated Zenodo deposition or a new deposition.

zenodo_token=/path/to/zenodo/token/file    # replace /path/to/zenodo/token/file with the path to your Zenodo token file

cartloader upload_zenodo \
  --in-dir ./cartload2 \
  --upload-method catalog \
  --zenodo-token $zenodo_token \
  --title  "Your Title" \
  --creators "Your Name" \
  --description "This is an example description"

Parameter	Required	Type	Description
`--in-dir`	required	string	Path to the input directory containing the `CartLoader` asset packaging output
`--upload-method`	required	string	Method to determine which files to upload. Options: `all` to upload all files in `--in-dir`; `catalog` to upload files listed in a catalog YAML file; `user_list` to upload files explicitly listed via `--in-list`
`--catalog-yaml`		string	Required if `--upload-method catalog`. Path to `catalog.yaml` generated in `run_cartload2`. If absent, uses the catalog in the input directory specified by `--in-dir`.
`--zenodo-token`	required	string	Path to your Zenodo access token file
`--title`	required	string	Required when creating a new deposition (i.e., if `--zenodo-deposition-id` is omitted). Title for the new Zenodo deposition.
`--creators`	required	list of str	List of creators in "Lastname, Firstname" format.

Output Data¶

View/Explore¶

The outputs are available in both CartoScope and Zenodo.

Explore in CartoScope

Download from Zenodo

See output details in the reference pages for run_ficture2 and run_cartload2.