Skip to content

Rule sdgeAR_segment:

Purpose

The sdgeAR_segment transforms pixel-based SGEs into hexagon-based SGEs by aggregating pixels into hexagonal grids, the size of which is determined by the user.

Input Files

  • FICTURE-compatible SGE and the Meta File for Coordinates The necessary input files include a FICTURE-compatible SGE and its corresponding meta file for X Y coordinates. The FICTURE-compatible SGE is produced by the Rule sdgeAR_reformat. The meta file for coordinates can be generated by sdge2sdgeAR or can be manually created.

Output Files

The rule generates the following output in the specified directory path:

1
<output_directory>/analysis/<run_id>/<unit_id>/segment/

(1) Hexagon-based SGE

Description: This output consists of an SGE formatted by segmenting pixels into hexagonal units. The size of the hexagons is defined by the user. The format of this SGE conforms to the 10x genome standard.

File Naming Convention:

1
<unit_id>.merged.matrix.tsv.gz

File Format:

Warning

The barcodes.tsv.gz and features.tsv.gz in the hexagon-based SGE is a bit different from those in the pixel-based SGE illustrated in Rule dge2sdge.

  • barcodes.tsv.gz:

    1
    2
    3
    1_0.0_3059096.64_1620124.64_11
    2_0.0_3727394.36_3208789.64_11
    3_0.0_4140308.56_2215259.44_17
    

    • Column 1: hexagon IDs
  • features.tsv.gz:

    1
    2
    3
    ENSMUSG00000029368  Alb     Gene Expression
    ENSMUSG00000002985  Apoe      Gene Expression
    ENSMUSG00000078672  Mup20   Gene Expression
    

    • Column 1: Gene Ensemble ID
    • Column 2: Gene symbol
    • Column 3: Gene info
  • matrix.mtx.gz:

    1
    2
    3
    4
    5
    6
    %%MatrixMarket matrix coordinate integer general
    %
    33951 79179 11120678
    826 1 1
    13 1 1
    3935 1 1
    

    • Header: Initial lines form the header, declaring the matrix's adherence to the Market Matrix (MTX) format, outlining its traits. This may include comments (lines beginning with %) for extra metadata, all marked by a “%”.
    • Dimensions: Following the header, the first line details the matrix dimensions: the count of rows (features), columns (barcodes), and non-zero entries.
    • Data Entries: Post-dimensions, subsequent lines enumerate non-zero entries in seven columns: row index (feature index), column index (barcode index), and five values (expression levels) corresponds to Gene, GeneFull, Spliced, Unspliced, and Ambiguous.

Output Guidelines

The output file can serve as input for tools that require SGE in the 10x genome format.

Parameters

1
2
3
4
5
6
7
8
9
downstream:
  mu_scale: 1000        
  segment:                 
    precision: 2
    min_pixel_per_unit: 10
    char:                
      - solofeature: gn    
        hexagonwidth: 24     
        segmentmove: 1     
  • The mu_scale Parameter Specify coordinate to um translate for hexagon. By default, we consider the spatial digital gene expression matrix (SGE) is in nano meter.

  • The segment Field

  • The precision Parameter Specifies the number of digits to store spatial location (in um, 0 for integer)
  • The min_pixel_per_unit Parameter A minimum UMI count of output hexagons
  • The char Parameter Specify the characteristics for the hexagons, including the genomic feature to create hexagon (solo feature), the size for a hexagonal grid (hexagonwidth), and whether the SGE is based on overlapping hexagons or non-overlapping hexagon (segmentmove). When segmentmove is 1, non-overlapping hexagon-based SGE will be created.

Dependencies

Rule sdgeAR_segment requires input from Rules sdgeAR_reformat and sdge2sdgeAR. Thus, Rule sdgeAR_segment can only execute after sdgeAR_reformat and sdge2sdgeAR and their prerequisite rules when applicable have successfully completed their operations. See an overview of the rule dependencies in the Workflow Structure.

Code Snippet

The code for this rule is provided in a08_sdgeAR_segment.smk.