Seq-Scope Starter Tutorial¶

Input Data¶

This tutorial uses an example SGE from mouse hippocampus, extracted via spatial masking from a Seq-Scope coronal brain slice.

File Format

Actual input formats are platform-dependent. Please refer to the Vignettes for detailed input specifications by each platform.

SeqScope provides SGE with three files:

barcodes.tsv.gz – spatial barcode metadata

AAAACAAAAACCTTCTTCGGACACTGGTCT  1   20  1   1   295288  1422349 0,1,0,0,0
AAAACAAAAATCCTGTTATACATGCCATGG  2   45  1   1   1745544 1110720 2,2,1,0,1
AAAACAAAACACGGGAAAAAACTATAGGTG  3   58  1   1   887244  250820  7,7,5,0,1

Column 1: Sorted spatial barcodes
Column 2: 1-based integer index of spatial barcodes, used in matrix.mtx.gz
Column 3: 1-based integer index from the full barcode that is in the STARsolo output
Column 4: Lane ID (fixed as 1)
Column 5: Tile ID (fixed as 1)
Column 6: X-coordinates
Column 7: Y-coordinates
Column 8: Five comma-separated numbers denote the count per spatial barcode for "Gene", "GeneFull", "Spliced", "Unspliced", and "Ambiguous".

features.tsv.gz – feature metadata

ENSMUSG00000100764  Gm29155 1   1,1,1,0,0
ENSMUSG00000100635  Gm29157 2   0,0,0,0,0
ENSMUSG00000100480  Gm29156 3   0,0,0,0,0

Column 1: Feature ID
Column 2: Feature symbol
Column 3: 1-based integer index of genes, used in matrix.mtx.gz
Column 4: Five comma-separated numbers denote the count per gene "Gene", "GeneFull", "Spliced", "Unspliced", and "Ambiguous".

matrix.mtx.gz – expression count matrix

%%MatrixMarket matrix coordinate integer general
%
33989 2928173 5404336
2487 1 0 1 0 0 0
5104 2 1 1 0 0 1

Header: Initial lines form the header, declaring the matrix's adherence to the Market Matrix (MTX) format, outlining its traits. This may include comments (lines beginning with %) for extra metadata, all marked by a “%”.
Dimensions: Following the header, the first line details the matrix dimensions: the count of rows (features), columns (barcodes), and non-zero entries.
Data Entries: Post-dimensions, subsequent lines enumerate non-zero entries in seven columns: row index (feature index), column index (barcode index), and five values (expression levels) corresponds to "Gene", "GeneFull", "Spliced", "Unspliced", and "Ambiguous".
- "Gene": represents unique, confidently mapped transcript count ("gene name"-based);
- "GeneFull": denotes total transcript count assigned to gene (includes ambiguities).

Data Access

The example data is hosted on Zenedo (10.5281/zenodo.15786632).

Follow the commands below to download the example data.

work_dir=/path/to/work/directory
cd $work_dir
wget  https://zenodo.org/records/15786632/files/seqscope_starter.raw.tar.gz 
tar -zxvf seqscope_starter.raw.tar.gz 

Set Up the Environment¶

Define paths to all required binaries and resources. Optionally, specify a fixed color map for consistent rendering.

# ====
# Replace each placeholder with the actual path on your system.  
# ====

work_dir=/path/to/work/directory        # path to work directory that contains the downloaded input data
cd $work_dir

# Define paths to required binaries and resources
spatula=/path/to/spatula/binary         # path to spatula executable
punkst=/path/to/punkst/binary           # path to FICTURE2/punkst executable
tippecanoe=/path/to/tippecanoe/binary   # path to tippecanoe executable
pmtiles=/path/to/pmtiles/binary         # path to pmtiles executable
aws=/path/to/aws/cli/binary             # path to AWS CLI binary

# (Optional) Define path to color map. 
cmap=/path/to/color/map                 # Path to the fixed color map for rendering. cartloader provides a fixed color map at cartloader/assets/fixed_color_map_256.tsv.

# Number of jobs
n_jobs=10                               # If not specify, the number of jobs defaults to 1.

# Activate the bioconda environment
conda activate ENV_NAME                 # replace BIOENV_NAME with your bioconda environment name

Define data ID and analysis parameters:

# Unique identifier for your dataset
DATA_ID="seqscope_hippo"                # change this to reflect your dataset name
PLATFORM="seqscope"                     # platform information
SCALE=1000                            # scale from coordinate to micrometer

# LDA parameters
train_width=18                           # define LDA training hexagon width (comma-separated if multiple widths are applied)
n_factor=6,12                            # define number of factors in LDA training (comma-separated if multiple n-factor are applied)

How to Define Scaling Factors for Seq-Scope

The latest SeqScope with an Illumina NovaSeq 6000 uses NovaScope pipeline to process sequencing data. NovaScope defaults to generate SGE at nanometer (nm) resolution, meaning each pixel corresponds to 1 nm.

Thus, use 1000 as scaling factor from coordinate to micrometer since 1000 nm = 1 µm.

SGE Format Conversion¶

Convert the raw input to the unified SGE format. See more details in SGE Format Conversion.

cartloader sge_convert \
  --makefn sge_convert.mk \
  --platform ${PLATFORM} \
  --in-mex ./raw \
  --units-per-um ${SCALE} \
  --out-dir ./sge \
  --exclude-feature-regex '^(BLANK|NegCon|NegPrb)' \
  --sge-visual \
  --spatula ${spatula} \
  --n-jobs ${n_jobs}

Parameter	Required	Type	Description
`--platform`	required	string	Platform (options: "`10x_visium_hd`", "`seqscope`", "`10x_xenium`", "`bgi_stereoseq`", "`cosmx_smi`", "`vizgen_merscope`", "`pixel_seq`", "`generic`")
`--in-mex`	required	string	Path to the input MEX directory containing gene × barcode matrix
`--units-per-um`	required	float	Scale to convert coordinates to microns (default: `1.0`)
`--out-dir`	required	string	Output directory for the converted SGE files
`--makefn`		string	File name for the generated Makefile (default: `sge_convert.mk`)
`--exclude-feature-regex`		regex	Pattern to exclude control features
`--sge-visual`		flag	Enable SGE visualization step (generates diagnostic image) (default: `FALSE`)
`--spatula`		string	Path to the spatula binary (default: `spatula`)
`--n-jobs`		int	Number of parallel jobs for processing (default: `1`)

`FICTURE` Analysis¶

Compute spatial factors using punkst (FICTURE2 mode). See more details in Reference page.

cartloader run_ficture2 \
  --makefn run_ficture2.mk \
  --main \
  --in-transcript ./sge/transcripts.unsorted.tsv.gz \
  --in-feature ./sge/feature.clean.tsv.gz \
  --in-minmax ./sge/coordinate_minmax.tsv \
  --cmap-file ${cmap} \
  --exclude-feature-regex '^(mt-.*$|Gm\d+$)' \
  --out-dir ./ficture2 \
  --width ${train_width} \
  --n-factor ${n_factor} \
  --spatula ${spatula} \
  --ficture2 ${punkst} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

Parameter	Required	Type	Description
`--main`	required ¹	flag	Enable `cartloader` to run all five steps
`--in-transcript`	required	string	Path to input transcript-level SGE file
`--out-dir`	required	string	Path to output directory
`--width`	required	int or comma-separated list	LDA training hexagon width(s)
`--n-factor`	required	int or comma-separated list	Number of LDA factors
`--makefn`		string	File name for the generated Makefile (default: `run_ficture2.mk` )
`--in-feature`		string	Path to input feature file
`--in-minmax`		string	Path to input coordinate min/max file
`--cmap-file`		string	Path to color map file
`--exclude-feature-regex`		regex	Pattern to exclude features
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--ficture2`		string	Path to the `punkst` directory (defaults to `punkst` repository within `submodules` directory of `cartloader`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `1`)

_{¹: cartloader requires the user to specify at least one action. Available actions includes: --tile to run tiling step; --segment to run segmentation step; --init-lda to run LDA training step; --decode to run decoding step; --summary to run summarization step; --main to run all above five actions.}

`cartloader` Compilation¶

Generate pmtiles and web-compatible tile directories. See more details in Reference page.

cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --fic-dir ./ficture2 \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs ${n_jobs} \
  --threads ${n_jobs}

Parameter	Required	Type	Description
`--fic-dir`	required	string	Path to the input directory containing FICTURE2 output
`--out-dir`	required	string	Path to the output directory for PMTiles and web tiles
`--id`	required	string	Dataset ID used for naming outputs and metadata
`--makefn`		string	File name for the generated Makefile (default: `run_cartload2.mk`)
`--spatula`		string	Path to the `spatula` binary (default: `spatula`)
`--pmtiles`		string	Path to the `pmtiles` binary (default: `pmtiles`)
`--tippecanoe`		string	Path to the `tippecanoe` binary (default: `tippecanoe`)
`--n-jobs`		int	Number of parallel jobs (default: `1`)
`--threads`		int	Number of threads per job (default: `1`)

Upload to Data Repository¶

Choose a data repository to host/share your output

cartloader supports two upload options (AWS and Zenodo) for storing PMTiles of SGE and spatial factors in a data repository.

Choose the one that best suits your needs.

AWS Uploads¶

Upload the generated cartloader outputs to your designated AWS S3 directory:

# AWS S3 target location for cartostore
AWS_BUCKET="EXAMPLE_AWS_BUCKET"         # replace EXAMPLE_AWS_BUCKET with your actual S3 bucket name

cartloader upload_aws \
  --in-dir ./cartload2 \
  --s3-dir "s3://${AWS_BUCKET}/${DATA_ID}" \
  --aws ${aws} \
  --n-jobs ${n_jobs}

Parameter	Required	Type	Description
`--in-dir`	required	string	Path to the input directory containing the cartloader compilation output
`--s3-dir`	required	string	Path to the target S3 directory for uploading
`--aws`		string	Path to the AWS CLI binary
`--n-jobs`		int	Number of parallel jobs

Zenodo Uploads¶

Upload the generated cartloader outputs to your designated Zenodo deposition or a new deposition.

zenodo_token=/path/to/zenodo/token/file    # replace /path/to/zenodo/token/file by path to your zenodo token file

cartloader upload_zenodo \
  --in-dir ./cartload2 \
  --upload-method catalog \
  --zenodo-token $zenodo_token \
  --create-new-deposition \
  --title  "Yur Title" \
  --creators "Your Name" \
  --description "This is an example description"

Parameter	Required	Type	Description
`--in-dir`	required	string	Path to the input directory containing the cartloader compilation output
`--upload-method`	required	string	Method to determine which files to upload. Options: `all` to upload all files in `--in-dir`; `catalog` to upload files listed in a catalog YAML file, `user_list` to upload files explicitly listed via `--in-list`
`--catalog-yaml`		string	Required if `--upload-method catalog`. Path to the catalog.yaml file generated in `run_cartload2`. If absent, will use the catalog.yaml in the input directory specified by `--in-dir`.
`--zenodo-token`	required	string	Path to your Zenodo access file
`--create-new-deposition`		flag	a new Zenodo deposition will be created.
`--title`	required	string	Required if `--create-new-deposition`. Title for the new Zenodo deposition.
`--creators`	required	list of str	List of creators in "Lastname, Firstname" format.

Output Data¶

See more details of output at the Reference pages for run_ficture2 and run_cartload2.

Spatial Factor Inference from `FICTURE`¶

Below is an example of spatial factor inference results from FICTURE using a training width of 18, 12 factors, a fit width of 18, and an anchor resolution of 6.

FICTURE cmap

Factor	RGB	Weight	PostUMI	TopGene_pval	TopGene_fc	TopGene_weight
0	255,101,101	0.57145	2577934	Snap25,Cck,Hpca,Syt1,Atp1b1,Selenow,Ywhah,Scg5,Cnih2,Atp1a3,Vsnl1,Gnas,Cpe,Prkcb,Stmn2,Atp2b1,Gng3,Prkcg,Snrpn,Norad	Nr2c1,Calca,Wdr91,Ccbe1,Esyt1,Epha3,Slc9a4,Thap7,Mapk11,Klk8,Arhgef6,1110018N20Rik,Iba57,Klk10,Gjd2,Leng9,Zbtb46,Slc9a2,Akr1c18,Dlx6os1	Snap25,Tmsb4x,Atp1b1,Actb,Cpe,Ywhah,Nrgn,Ppp3ca,Selenow,Calm1,Atp1a3,Calm2,Fth1,Rtn1,Cox8a,Gnas,Camk2a,Norad,Aldoa_v1,Ndrg4
1	237,238,0	0.15458	697338	Ppp3ca,Nrgn,Ptk2b,Olfm1,Ppp3r1,Gria2,Ncdn,Nsf,Syne1,Snca,Chn1,Tmsb4x,Rasgrf1,Grin2a,Enc1,Kalrn,Wasf1,Camk2b,Calm2,Epha7	Vwa8,Zfp583,Sppl2b,Faap24,Recql5,A230051N06Rik,Ints6l,Rnaseh2b,Strada,Scnn1a,Snhg17,Prelid3a,Grhl1,Icam4,Slc44a5,Tyw1,Parp8,Asb11,Vipr1,Zfp668	Ppp3ca,Nrgn,Tmsb4x,Actb,Snap25,Atp1b1,Olfm1,Ywhah,Calm2,Ncdn,Rtn1,Ptk2b,Calm1,Ppp3r1,Actg1,Chn1,Fth1,Ndrg4,Fkbp1a,Cfl1
2	101,255,101	0.116	523314	Cst3,Glul,Slc1a2,Mt1,Apoe,Sparcl1,Aldoc,Clu,Atp1a2,Slc1a3,Camk2a,Ndrg2,Gfap,Mt2,Gpr37l1,Atp1b2,Fam107a,Prdx6,Bc1,Plpp3	Pdk4,Bgn,Unc93b1,Rfx4,Ccr5,Phkg1,Slc38a3,Pax6,Chil1,Gli3,Thbs4,Ppp1r18,Sh3pxd2b,Lyz2,Cpt1a,Aif1,Pdlim4,Cd33,Lcat,Arhgef19	Cst3,Glul,Slc1a2,Sparcl1,Camk2a,Cpe,Mt1,Apoe,Mbp,Fth1,Aldoc,Clu,Ttr,Camk2n1,Ckb,Ddn,Rps29,Atp1a2,Scd2,Mt3
3	101,254,255	0.09284	418815	Fam163b,Prox1,Adcy1,Stxbp6,C1ql2,Btbd3,Synpr,Sema5a,Dsp,Eef1a1,Ncdn,Jun,Lrrtm4,Rfx3,Olfm1,Dgkh,Marcksl1,Ncald,Pitpnm2,Nrgn	Il20rb,Tdo2,Col22a1,C1ql2,St3gal1,Plk5,Dsp,Prox1os,Fam163b,Prox1,Rph3al,H2bc6,Lrrtm4,Mcm6,Khdrbs2,Vwa3b,Prdm5,Npnt,Dact1,Stxbp6	Nrgn,Eef1a1,Ncdn,Olfm1,Ppp3ca,Actb,Tmsb4x,Calm1,Camk2a,Sparcl1,Atp1b1,Adcy1,Fam163b,Tspan7,Ndrg4,Rplp1,Arf3,Camk2n1,Rpl17,Ptk2b
4	101,101,255	0.03885	175240	Plp1,Mbp,Gatm,Mobp,Cnp,Cldn11,Fth1,Car2,Ermn,Cryab,Qdpr,Tubb4a,Trf,Plekhb1,Qki,Tspan2,Mal,Bcas1,Tmem88b,Septin4	Insc,Creb5,Tmem125,Trim36,Hapln2,Pde1c,Gjc2,Prr5l,Gjc3,Sec14l5,Gatm,Nkx6-2,Tmem88b,Adamts4,Plekhh1,Ermn,Plp1,Trim59,Plekhg3,Enpp6	Mbp,Plp1,Fth1,Mobp,Ptgds,Actb,Gatm,Tubb4a,Cnp,Car2,Cldn11,Tpt1,Scd2,Glul,Qdpr,App,Qki,Ptma,Malat1,Lars2
5	255,101,254	0.02347	105862	Ttr,Ptgds,Enpp2,Tac2,Gng8,Sostdc1,Ecrg4,Zic1,Calb2,Nnat,Dcn,Tmem212,Adcyap1,Gpr151,2900040C04Rik,Necab2,Apod,Nhlh2,Pou4f1,Ace	Tmem212,Kcne2,Cldn2,Dcn,Septin10,Col8a2,Pou4f1,Nhlh2,Gng8,Wif1,Chrna3,Adcyap1,Folr1,Col1a2,Clec3b,Gng14,Sostdc1,Foxc1,Ecrg4,Tac2	Ttr,Ptgds,Enpp2,Apoe,Cpe,Pcp4,Actb,Eef1a1,Nnat,Psap,Fth1,Tpt1,Cst3,Dbi,Atp1b1,Cox8a,Sparcl1,Clu,Rpl23,Cox7c
6	255,178,101	0.00163	7366	Nkx2-2,Abtb2,Myo1d,C030029H02Rik,Pogk,Dusp16,Smco3,Gjb1,Ldlrad3,Rhobtb3,Lrrc8c,Cdr2,Piga,Tjap1,Carns1,Gpt,Prim1,Sh3gl3,Plk3,Cerox1	Nkx2-2,Abtb2,C030029H02Rik,Myo1d,Dusp16,Smco3,Gjb1,Pogk,Prim1,Ldlrad3,Plk3,Vgll4,Carns1,Gpt,Letm2,Cdr2,Piga,Fign,Lmln,Tjap1	Mbp,Fth1,Pogk,Malat1,Qdpr,Plp1,Myo1d,Tubb4a,Abtb2,Rps27a,Rhobtb3,Glul,Gatm,Nkx2-2,Sh3gl3,Alkbh5,C030029H02Rik,Smco3,Tpt1,Gjb1
7	178,255,101	0.00073	3290	Sst,Crhbp,Npy,Cort,Reln,Uhrf1bp1,Rab3b,2310010J17Rik,Elfn1,Lypd6b,Lgals1,Rpp25,Gad2,Cdh13,Clic5,Cenpf,Dlx1,Bcam,Foxred2,Sec14l5	Sst,Uhrf1bp1,Crhbp,Cort,Reln,Lypd6b,2310010J17Rik,Elfn1,Rab3b,Npy,Clic5,Rpp25,Cenpf,Bcam,Lgals1,Sec14l5,Dlx1,Foxred2,Gpc3,Cdh13	Sst,Npy,Crhbp,Atp1b1,Reln,Cort,Zwint,Gad2,Rab3b,Syt1,2310010J17Rik,Snap25,Sparcl1,Atp1a3,Oxr1,Scg2,Mdh1,Atp6v0c,Atp6v1e1,Vgf
9	101,178,255	0.0004	1817	Hba-a2,Hbb-bs,Hba-a1,Hbb-bt,Tent5c,Polr2l,Aven,Tinagl1,Map2k3,Rgs6,Bst2,Zfp318,Stk40,Plekha8,Ube2l6,Rad50,Slc12a4,Kank2,Srgap1,Klf2	Hba-a1,Hbb-bs,Hbb-bt,Hba-a2,Tent5c,Polr2l,Aven,Tinagl1,Map2k3,Rgs6,Bst2,Slc12a4,Stk40,Ube2l6,Plekha8,Kank2,Rad50,Zfp318,Prtg,Srgap1	Hba-a2,Hbb-bs,Hba-a1,Hbb-bt,Polr2l,Atp1b1,Camk2a,Plekhb1,Tent5c,Rpl31,Praf2,Tpd52,Ptgds,Ddn,Fth1,Calm3,Rpl38,Mobp,Psme3,Mkrn1
8	0,223,95	5e-05	214	Sass6,Recql5,Jmjd4,A230072C01Rik,Zfp583,Eef1akmt2,Rnaseh2b,Orc6,Akap10,Mak16,Slc25a35,Dclre1b,C130074G19Rik,Fbxw4,Arhgap10,Il17ra,Zbtb2,Katnip,Hook2,Ints6l	Sass6,Recql5,Jmjd4,A230072C01Rik,Zfp583,Rnaseh2b,Eef1akmt2,Akap10,Orc6,Slc25a35,Dclre1b,C130074G19Rik,Mak16,Arhgap10,Il17ra,Fbxw4,Katnip,Zbtb2,Hook2,Ints6l	Sass6,Recql5,Jmjd4,Cst3,Mak16,A230072C01Rik,Orc6,Tmsb4x,Zfp583,Hexb,Eef1akmt2,Capzb,Gabarapl1,Mdh2,Odc1,Rpl6,Olfm1,Rps8,Golga1,Zfp106

Packed SGE and Spatial Factor Outputs from `run_cartload2`¶

The packed SGE data and spatial factor inferences generated by FICTURE are available in PMTile format on Zenodo: DOI:10.5281/zenodo.15759403.

These datasets can also be loaded directly using the following catalog YAML file:
https://zenodo.org/records/15802634/catalog.yaml