Skip to content

Pixel-Seq Starter Tutorial

Input Data

Since the Pixel-Seq publication provides SGE data only from the mouse olfactory bulb and parabrachial nucleus — neither of which includes the hippocampus — we extract a subregion from the olfactory bulb as the input for this tutorial.

File Format

The Pixel-Seq SGE includes one tab-delimited text file, where each row represents a unique RNA molecule detected within a defined field of view (FOV), with associated genomic and spatial metadata.

TSV file format

1
2
3
4
FOVx  FOVy  xcoord     ycoord     UMIs     SpatialBarcode            MapStrand  Chrom  Start      STARmapping        Counts  geneID              geneName  bioType                 intronRatio
017   005   26691.5    5786.5     TAACGAA  AAGGTTCATACCTACGACTGTTAA  16         1      24613729   150M               1       ENSMUSG00000101111  Gm28437   unprocessed_pseudogene  0.00
016   007   27590.25   4639.0815  TAATATA  AATGGCGCATTTTGCTGTTTAGGC  16         2      39001628   138M2341N12M       1       ENSMUSG00000062997  Rpl35     protein_coding          0.00
018   006   25099.945  5621.8335  AGTTGTA  CTGCATATGTGTCACCTAGGTAGC  16         1      24615767   150M               1       ENSMUSG00000101249  Gm29216   unprocessed_pseudogene  0.00
  • FOVx, FOVy: Field-of-view indices indicating the imaging tile coordinates in the x and y directions.
  • xcoord, ycoord: Spatial coordinates (in microns or pixels).
  • UMIs: Unique molecular identifier (UMI) sequence.
  • SpatialBarcode: Spatial barcode capturing the location and identity.
  • MapStrand: Indicates the strand orientation of the mapped read.
  • Chrom, Start: Chromosome number and start position of the mapped read on the genome.
  • STARmapping: Alignment pattern (CIGAR string) from the STAR aligner indicating how the transcript maps to the genome.
  • Counts: Number of times the UMI/gene combination was observed.
  • geneID, geneName: Ensembl gene ID and gene symbol.
  • bioType: Gene biotype.
  • intronRatio: Fraction of UMI counts assigned to intronic regions.

Data Access

The example data is hosted on Zenedo ().

Follow the commands below to download the example data.

1
2
3
4
work_dir=/path/to/work/directory
cd $work_dir
wget  https://zenodo.org/records/15701394/files/pixelseq_starter.raw.tar.gz 
tar --strip-components=1 -zxvf pixelseq_starter.raw.tar.gz 

Set Up the Environment

Define paths to all required binaries and resources, and target AWS S3 bucket. Optionally, specify a fixed color map for consistent rendering.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# ====
# Replace each placeholder with the actual path on your system.  
# ====
work_dir=/path/to/work/directory        # path to work directory that contains the downloaded input data
cd $work_dir

# Define paths to required binaries and resources
spatula=/path/to/spatula/binary         # path to spatula executable
punkst=/path/to/punkst/binary           # path to FICTURE2/punkst executable
tippecanoe=/path/to/tippecanoe/binary   # path to tippecanoe executable
pmtiles=/path/to/pmtiles/binary         # path to pmtiles executable
aws=/path/to/aws/cli/binary             # path to AWS CLI binary

# (Optional) Define path to color map. 
cmap=/path/to/color/map                 # Path to the fixed color map for rendering. cartloader provides a fixed color map at cartloader/assets/fixed_color_map_256.tsv.

# AWS S3 target location for cartostore
AWS_BUCKET="EXAMPLE_AWS_BUCKET"         # replace EXAMPLE_AWS_BUCKET with your actual S3 bucket name

# Activate the bioconda environment
conda activate BIOENV_NAME              # replace BIOENV_NAME with your bioconda environment name

Define data ID and analysis parameters:

1
2
3
4
5
6
7
8
# Unique identifier for your dataset
DATA_ID="pixelseq_hippo"                # change this to reflect your dataset name
PLATFORM="pixel_seq"                    # platform information
SCALE=3.076923                        # scale from coordinate to micrometer

# LDA parameters
train_width=18                           # define LDA training hexagon width (comma-separated if multiple widths are applied)
n_factor=6,12                            # define number of factors in LDA training (comma-separated if multiple n-factor are applied)

How to Define Scaling Factors for Pixel-Seq?

In Pixel-Seq publication:

"Because polonies have varied sizes and shapes, to maximize the feature resolution we developed a base-calling pipeline to determine the major barcode species in each pixel (0.325 * 0.325 mm2) of gel images to construct a spatial barcode map".

Accordingly, we defined scale as 1/0.325 = 3.076923

SGE Format Conversion

Convert the raw input to the unified SGE format. See more details in Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cartloader sge_convert \
  --makefn sge_convert.mk \
  --platform ${PLATFORM} \
  --in-csv ./input.tsv.gz \
  --units-per-um ${SCALE} \
  --out-dir ./sge \
  --exclude-feature-regex '^(BLANK|NegCon|NegPrb)' \
  --sge-visual \
  --spatula ${spatula} \
  --n-jobs 10
Parameter Required Type Description
--platform required string Platform (options: "10x_visium_hd", "seqscope", "10x_xenium", "bgi_stereoseq", "cosmx_smi", "vizgen_merscope", "pixel_seq", "generic")
--in-csv required string Path to the input TSV/CSV file
--units-per-um required float Scale to convert coordinates to microns (default: 1.0)
--out-dir required string Output directory for the converted SGE files
--makefn string File name for the generated Makefile (default: sge_convert.mk)
--exclude-feature-regex regex Pattern to exclude control features
--sge-visual flag Enable SGE visualization step (generates diagnostic image) (default: FALSE)
--spatula string Path to the spatula binary (default: spatula)
--n-jobs int Number of parallel jobs for processing (default: 1)

FICTURE analysis

Compute spatial factors using punkst (FICTURE2 mode). See more details in Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cartloader run_ficture2 \
  --makefn run_ficture2.mk \
  --main \
  --in-transcript ./sge/transcripts.unsorted.tsv.gz \
  --in-feature ./sge/feature.clean.tsv.gz \
  --in-minmax ./sge/coordinate_minmax.tsv \
  --cmap-file ${cmap} \
  --exclude-feature-regex '^(mt-.*$|Gm\d+$)' \
  --out-dir ./ficture2 \
  --width ${train_width} \
  --n-factor ${n_factor} \
  --spatula ${spatula} \
  --ficture2 ${punkst} \
  --n-jobs 10 \
  --threads 10
Parameter Required Type Description
--main required 1 flag Enable cartloader to run all five steps
--in-transcript required string Path to input transcript-level SGE file
--out-dir required string Path to output directory
--width required int or comma-separated list LDA training hexagon width(s)
--n-factor required int or comma-separated list Number of LDA factors
--makefn string File name for the generated Makefile (default: run_ficture2.mk )
--in-feature string Path to input feature file
--in-minmax string Path to input coordinate min/max file
--cmap-file string Path to color map file
--exclude-feature-regex regex Pattern to exclude features
--spatula string Path to the spatula binary (default: spatula)
--ficture2 string Path to the punkst directory (defaults to punkst repository within submodules directory of cartloader)
--n-jobs int Number of parallel jobs (default: 1)
--threads int Number of threads per job (default: 1)

1: cartloader requires the user to specify at least one action. Available actions includes: --tile to run tiling step; --segment to run segmentation step; --init-lda to run LDA training step; --decode to run decoding step; --summary to run summarization step; --main to run all above five actions.

cartloader Compilation

Generate pmtiles and web-compatible tile directories. See more details in Reference page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cartloader run_cartload2 \
  --makefn run_cartload2.mk \
  --fic-dir ./ficture2 \
  --out-dir ./cartload2 \
  --id ${DATA_ID} \
  --spatula ${spatula} \
  --pmtiles ${pmtiles} \
  --tippecanoe ${tippecanoe} \
  --n-jobs 10 \
  --threads 10
Parameter Required Type Description
--fic-dir required string Path to the input directory containing FICTURE2 output
--out-dir required string Path to the output directory for PMTiles and web tiles
--id required string Dataset ID used for naming outputs and metadata
--makefn string File name for the generated Makefile (default: run_cartload2.mk)
--spatula string Path to the spatula binary (default: spatula)
--pmtiles string Path to the pmtiles binary (default: pmtiles)
--tippecanoe string Path to the tippecanoe binary (default: tippecanoe)
--n-jobs int Number of parallel jobs (default: 1)
--threads int Number of threads per job (default: 1)

Upload to Data Repository

AWS Uploads

Copy the generated cartloader outputs to your designated AWS S3 catalog path:

1
2
3
4
5
cartloader upload_aws \
  --in-dir ./cartload2 \
  --s3-dir "s3://${AWS_BUCKET}/${DATA_ID}" \
  --aws ${aws} \
  --n-jobs 10
Parameter Required Type Description
--in-dir required string Path to the input directory containing the cartloader compilation output
--s3-dir required string Path to the target S3 directory for uploading
--aws string Path to the AWS CLI binary
--n-jobs int Number of parallel jobs