CosMX SMI Starter Tutorial¶

Input Data¶

The input data is from an adult mouse hippocampus, extracted by masking a coronal brain section. The original full-section

File Format

The CosMx SMI by NanoString generates high-resolution spatial transcriptomics data with single-molecule resolution with a comma-separated values (CSV) table.

CSV File Format

"fov","cell_ID","x_global_px","y_global_px","x_local_px","y_local_px","z","target","CellComp"
64,0,-473043,7954.533,4015.3,4246.2,1,"Gfap","None"
64,0,-473022.9,7902.723,4035.48,4194.39,1,"Fth1","None"
64,0,-473132,7836.476,3926.34,4128.143,1,"Ptn","None"

fov: The field of view (FOV) number.
cell_ID: Unique identifier for a single cell within a given FOV. 0 if background or unassigned molecules.
x_global_px, y_global_px: Global pixel coordinates relative to the tisse.
x_local_px, y_local_px: The x or y position (in pixels) relative to the given FOV.
z: Z-plane index representing the depth (optical section) where the transcript was detected.
target: Name of the target.
CellComp: Subcellular location of target.

Data Access

The example data is hosted on Zenedo ().

Follow the commands below to download the example data.

work_dir=/path/to/work/directory
cd $work_dir
wget  https://zenodo.org/records/15701394/files/cosmxsmi_starter.raw.tar.gz
tar --strip-components=1 -zxvf cosmxsmi_starter.raw.tar.gz

Set Up the Environment¶

Define paths to all required binaries and resources, and target AWS S3 bucket. Optionally, specify a fixed color map for consistent rendering.

# ====
# Replace each placeholder with the actual path on your system.  
# ====
work_dir=/path/to/work/directory        # path to work directory that contains the downloaded input data
cd $work_dir

# Define paths to required binaries and resources
spatula=/path/to/spatula/binary         # path to spatula executable
punkst=/path/to/punkst/binary           # path to FICTURE2/punkst executable
tippecanoe=/path/to/tippecanoe/binary   # path to tippecanoe executable
pmtiles=/path/to/pmtiles/binary         # path to pmtiles executable
aws=/path/to/aws/cli/binary             # path to AWS CLI binary

# (Optional) Define path to color map. 
cmap=/path/to/color/map                 # Path to the fixed color map for rendering. cartloader provides a fixed color map at cartloader/assets/fixed_color_map_256.tsv.

# AWS S3 target location for cartostore
AWS_BUCKET="EXAMPLE_AWS_BUCKET"         # replace EXAMPLE_AWS_BUCKET with your actual S3 bucket name

# Activate the bioconda environment
conda activate BIOENV_NAME              # replace BIOENV_NAME with your bioconda environment name

Define data ID and analysis parameters:

# Unique identifier for your dataset
DATA_ID="cosmxsmi_hippo"                # change this to reflect your dataset name
PLATFORM="cosmx_smi"                    # platform information
SCALE=$(echo 1000/120|bc -l)              # scale from coordinate to micrometer

# LDA parameters
train_width=12                           # define LDA training hexagon width (comma-separated if multiple widths are applied)
n_factor=6,12                            # define number of factors in LDA training (comma-separated if multiple n-factor are applied)

How to Define Scaling Factors for CosMX SMI?

According to the README.html provided with the Pixel-seq dataset, each pixel has an edge length of 120 nm. To calculate the number of pixels per micrometer, use the formula: scale = 1000 / 120.

SGE Format Conversion¶

Convert the raw input to the unified SGE format. See more details in Reference page.

cartloader sge_convert \
  --makefn sge_convert.mk \
  --platform ${PLATFORM} \
  --in-csv ./input.tsv.gz \
  --units-per-um ${SCALE} \
  --out-dir ./sge \
  --exclude-feature-regex '^(BLANK|NegCon|NegPrb)' \
  --sge-visual \
  --spatula ${spatula} \
  --n-jobs 10

Parameter	Required	Type	Description
`--platform`	required	string	Platform (options: "`10x_visium_hd`", "`seqscope`", "`10x_xenium`", "`bgi_stereoseq`", "`cosmx_smi`", "`vizgen_merscope`", "`pixel_seq`", "`generic`")
`--in-csv`	required	string	Path to the input TSV/CSV file
`--units-per-um`	required	float	Scale to convert coordinates to microns (default: `1.0`)
`--out-dir`	required	string	Output directory for the converted SGE files
`--makefn`		string	File name for the generated Makefile (default: `sge_convert.mk`)
`--exclude-feature-regex`		regex	Pattern to exclude control features
`--sge-visual`		flag	Enable SGE visualization step (generates diagnostic image) (default: `FALSE`)
`--spatula`		string	Path to the spatula binary (default: `spatula`)
`--n-jobs`		int	Number of parallel jobs for processing (default: `1`)

`FICTURE` analysis¶

Compute spatial factors using punkst (FICTURE2 mode). See more details in Reference page.

cartloader run_ficture2 \
  --makefn run_ficture2.mk \
  --main \
  --in-transcript ./sge/transcripts.unsorted.tsv.gz \
  --in-feature ./sge/feature.clean.tsv.gz \
  --in-minmax ./sge/coordinate_minmax.tsv \
  --cmap-file ${cmap} \
  --exclude-feature-regex '^(mt-.*$|Gm\d+$)' \
  --out-dir ./ficture2 \
  --width ${train_width} \
  --n-factor ${n_factor} \
  --spatula ${spatula} \
  --ficture2 ${punkst} \
  --n-jobs 10 \
  --threads 10

Parameter	Required	Type	Description
`--main`	required ¹	flag	Enable `cartloader` to run all five steps
`--in-transcript`	required	string	Path to input transcript-level SGE file
`--out-dir`	required	string	Path to output directory
`--width`	required	int or comma-separated list	LDA training hexagon width(s)
`--n-factor`	required	int or comma-separated list	Number of LDA factors
`--makefn`		string	File name for the generated Makefile (default: `run_ficture2.mk` )
`--in-feature`		string	Path to input feature file
`--in-minmax`		string	Path to input coordinate min/max file
`--cmap-file`		string	Path to color map file
`--exclude-feature-regex`		regex	Pattern to exclude features
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--ficture2`		string	Path to the `punkst` directory (defaults to `punkst` repository within `submodules` directory of `cartloader`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `1`)

_{¹: cartloader requires the user to specify at least one action. Available actions includes: --tile to run tiling step; --segment to run segmentation step; --init-lda to run LDA training step; --decode to run decoding step; --summary to run summarization step; --main to run all above five actions.}

`cartloader` Compilation¶

Generate pmtiles and web-compatible tile directories. See more details in Reference page.

cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --fic-dir ./ficture2 \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs 10 \
  --threads 10

Parameter	Required	Type	Description
`--fic-dir`	required	string	Path to the input directory containing FICTURE2 output
`--out-dir`	required	string	Path to the output directory for PMTiles and web tiles
`--id`	required	string	Dataset ID used for naming outputs and metadata
`--makefn`		string	File name for the generated Makefile (default: `run_cartload2.mk`)
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--pmtiles`		string	Path to the `pmtiles` binary (default: `pmtiles`)
`--tippecanoe`		string	Path to the `tippecanoe` binary (default: `tippecanoe`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `1`)

Upload to Data Repository¶

AWS Uploads¶

Copy the generated cartloader outputs to your designated AWS S3 catalog path:

cartloader upload_aws \
  --in-dir ./cartload2 \
  --s3-dir "s3://${AWS_BUCKET}/${DATA_ID}" \
  --aws ${aws} \
  --n-jobs 10

Parameter	Required	Type	Description
`--in-dir`	required	string	Path to the input directory containing the cartloader compilation output
`--s3-dir`	required	string	Path to the target S3 directory for uploading
`--aws`		string	Path to the AWS CLI binary
`--n-jobs`		int	Number of parallel jobs