Skip to content

CosMX SMI Starter Tutorial

Input Data

The input data is from an adult mouse hippocampus, extracted by masking a coronal brain section. The original full-section

File Format

NanoString CosMx SMI produces single‑molecule spatial transcriptomics data as a comma‑separated values (CSV) table.

CSV File Format

1
2
3
4
"fov","cell_ID","x_global_px","y_global_px","x_local_px","y_local_px","z","target","CellComp"
64,0,-473043,7954.533,4015.3,4246.2,1,"Gfap","None"
64,0,-473022.9,7902.723,4035.48,4194.39,1,"Fth1","None"
64,0,-473132,7836.476,3926.34,4128.143,1,"Ptn","None"
  • fov: The field of view (FOV) number.
  • cell_ID: Unique identifier for a single cell within a given FOV; 0 if background or unassigned molecules.
  • x_global_px, y_global_px: Global pixel coordinates relative to the tissue.
  • x_local_px, y_local_px: The x or y position (in pixels) relative to the given FOV.
  • z: Z-plane index representing the depth (optical section) where the transcript was detected.
  • target: Target name.
  • CellComp: Subcellular location of the target.

Data Access

The example data is hosted on Zenodo.

Follow the commands below to download the example data.

1
2
3
4
work_dir=/path/to/work/directory
cd $work_dir
wget  https://zenodo.org/records/17953582/files/cosmxsmi_starter.raw.tar.gz
tar --strip-components=1 -zxvf cosmxsmi_starter.raw.tar.gz

Set Up the Environment

Pre-installed tools

Please ensure you have installed all required tools (See Installation).

Define paths to all required binaries and resources. Optionally, specify a fixed color map for consistent rendering.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# ====
# Replace each placeholder with the actual path on your system.  
# ====

work_dir=/path/to/work/directory        # path to work directory that contains the downloaded input data
cd $work_dir

# Define paths to required binaries and resources
spatula=/path/to/spatula/binary         # path to spatula executable
punkst=/path/to/punkst/binary           # path to FICTURE2 (punkst) executable
tippecanoe=/path/to/tippecanoe/binary   # path to tippecanoe executable
pmtiles=/path/to/pmtiles/binary         # path to pmtiles executable
aws=/path/to/aws/cli/binary             # path to AWS CLI binary

# (Optional) Define path to color map. 
cmap=/path/to/color/map                 # Path to fixed color map. `CartLoader` includes one at cartloader/assets/fixed_color_map_256.tsv.

# Number of jobs
n_jobs=10                               # If not specified, the number of jobs defaults to 1.

# Activate the bioconda environment
conda activate ENV_NAME                 # replace ENV_NAME with your conda environment name

Define data ID and analysis parameters:

1
2
3
4
5
6
7
8
# Unique identifier for your dataset
DATA_ID="cosmxsmi_hippo"                # change this to reflect your dataset name
PLATFORM="cosmx_smi"                    # platform information
SCALE=$(echo 1000/120|bc -l)              # scale from coordinate to micrometer

# LDA parameters
train_width=12                           # define LDA training hexagon width (comma-separated if multiple widths are applied)
n_factor=6,12                            # define number of factors in LDA training (comma-separated if multiple n-factor are applied)

How to Define Scaling Factors for CosMX SMI?

According to the README.html provided with the Pixel-seq dataset, each pixel has an edge length of 120 nm. To calculate the number of pixels per micrometer, use the formula: scale = 1000 / 120.

SGE Format Conversion

Convert the raw input to the unified SGE format. See more details in its Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cartloader sge_convert \
  --makefn sge_convert.mk \
  --platform ${PLATFORM} \
  --in-csv ./input.tsv.gz \
  --units-per-um ${SCALE} \
  --out-dir ./sge \
  --exclude-feature-regex '^(BLANK|Neg|Intergenic|Deprecated|Unassigned)' \
  --sge-visual \
  --spatula ${spatula} \
  --n-jobs ${n_jobs}
Parameter Required Type Description
--platform required string Platform (options: "10x_visium_hd", "seqscope", "10x_xenium", "bgi_stereoseq", "cosmx_smi", "vizgen_merscope", "pixel_seq", "generic")
--in-csv required string Path to the input TSV/CSV file
--units-per-um required float Scale to convert coordinates to microns (default: 1.0)
--out-dir required string Output directory for the converted SGE files
--makefn string File name for the generated Makefile (default: sge_convert.mk)
--exclude-feature-regex regex Pattern to exclude control features
--sge-visual flag Enable SGE visualization step (generates diagnostic image) (default: FALSE)
--spatula string Path to the spatula binary (default: spatula)
--n-jobs int Number of parallel jobs for processing (default: 1)

FICTURE Analysis

Compute spatial factors using punkst (FICTURE2 mode). See more details on the Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cartloader run_ficture2 \
  --makefn run_ficture2.mk \
  --main \
  --in-transcript ./sge/transcripts.unsorted.tsv.gz \
  --in-feature ./sge/feature.clean.tsv.gz \
  --in-minmax ./sge/coordinate_minmax.tsv \
  --cmap-file ${cmap} \
  --exclude-feature-regex '^(mt-.*$|Gm\d+$)' \
  --out-dir ./ficture2 \
  --width ${train_width} \
  --n-factor ${n_factor} \
  --spatula ${spatula} \
  --ficture2 ${punkst} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}
Parameter Required Type Description
--main required 1 flag Enable CartLoader to run all five steps
--in-transcript required string Path to input transcript-level SGE file
--out-dir required string Path to output directory
--width required int or comma-separated list LDA training hexagon width(s)
--n-factor required int or comma-separated list Number of LDA factors
--makefn string File name for the generated Makefile (default: run_ficture2.mk )
--in-feature string Path to input feature file
--in-minmax string Path to input coordinate min/max file
--cmap-file string Path to color map file
--exclude-feature-regex regex Pattern to exclude features
--spatula string Path to the spatula binary (default: spatula)
--ficture2 string Path to the punkst directory (defaults to punkst repository within submodules directory of CartLoader)
--n-jobs int Number of parallel jobs (default: 1)
--threads int Number of threads per job (default: 1)

1: CartLoader requires the user to specify at least one action. Available actions includes: --tile to run tiling step; --segment to run segmentation step; --init-lda to run LDA training step; --decode to run decoding step; --summary to run summarization step; --main to run all above five actions.

CartLoader Asset Packaging

Generate pmtiles and web-compatible tile directories. See more details in Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Example A: With FICTURE outputs (integrates factors + joins)
cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --fic-dir ./ficture2 \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

# Example B: SGE-only (package molecules without FICTURE)
cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --sge-dir ./sge_convert \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}
Parameter Required Type Description
--out-dir required string Path to the output directory for PMTiles and web tiles
--id required string Dataset ID used for naming outputs and metadata
--fic-dir string Path to FICTURE outputs (enables factor layers + molecule–factor joins)
--sge-dir string Path to SGE outputs from sge_convert (enables SGE-only packaging)
--in-sge-assets string File name of SGE assets JSON/YAML in --sge-dir (default: sge_assets.json)
--in-fic-params string File name of FICTURE params JSON/YAML in --fic-dir (default: ficture.params.json)
--makefn string File name for the generated Makefile (default: run_cartload2.mk)
--spatula string Path to the spatula binary (default: spatula)
--pmtiles string Path to the pmtiles binary (default: pmtiles)
--tippecanoe string Path to the tippecanoe binary (default: tippecanoe)
--n-jobs int Number of parallel jobs (default: 1)
--threads int Number of threads per job (default: 4)

Upload to Data Repository

Choose a data repository to host/share your output

CartLoader supports two upload options (AWS and Zenodo) for storing PMTiles of SGE and spatial factors in a data repository.

Choose the one that best suits your needs.

AWS Uploads

Upload the generated CartLoader outputs to your designated AWS S3 directory:

1
2
3
4
5
6
7
8
# AWS S3 target location for cartostore
S3_DIR=/s3/path/to/s3/dir              # Recommend to use DATA_ID as directory name, such as s3://bucket_name/xenium-v1-humanlung-cancer-ffpe

cartloader upload_aws \
  --in-dir ./cartload2 \
  --s3-dir "${S3_DIR}" \
  --aws ${aws} \
  --n-jobs ${n_jobs}
Parameter Required Type Description
--in-dir required string Path to the input directory containing the CartLoader asset packaging output
--s3-dir required string Path to the target S3 directory for uploading
--aws string Path to the AWS CLI binary
--n-jobs int Number of parallel jobs

Zenodo Uploads

Upload the generated CartLoader outputs to your designated Zenodo deposition or a new deposition.

1
2
3
4
5
6
7
8
9
zenodo_token=/path/to/zenodo/token/file    # replace /path/to/zenodo/token/file by path to your zenodo token file

cartloader upload_zenodo \
  --in-dir ./cartload2 \
  --upload-method catalog \
  --zenodo-token $zenodo_token \
  --title  "Your Title" \
  --creators "Your Name" \
  --description "This is an example description"
Parameter Required Type Description
--in-dir required string Path to the input directory containing the CartLoader asset packaging output
--upload-method required string Method to determine which files to upload. Options: all to upload all files in --in-dir; catalog to upload files listed in a catalog YAML file; user_list to upload files explicitly listed via --in-list
--catalog-yaml string Required if --upload-method catalog. Path to catalog.yaml generated in run_cartload2. If absent, uses the catalog in the input directory specified by --in-dir.
--zenodo-token required string Path to your Zenodo access token file
--title required string Required when creating a new deposition (i.e., if --zenodo-deposition-id is omitted). Title for the new Zenodo deposition.
--creators required list of str List of creators in "Lastname, Firstname" format.

Output Data

See more details of output at the Reference pages for run_ficture2 and run_cartload2.