Rule sdgeAR_segment_ficture
:¶
Purpose¶
The sdgeAR_segment_ficture
transforms transcript-indexed SGEs into hexagon-indexed SGEs by aggregating pixels into hexagonal grids, the size of which is determined by the user. This hexagon-indexed SGEs will be in a TSV format that is compatible for FICTURE.
Input Files¶
-
A SGE matrix in a FICTURE-compatible Format and Correspondings Files The necessary input files include a FICTURE-compatible SGE matrix and its corresponding meta file for X and Y coordinates. If the user requests filtered hexagon-indexed SGE matrix (i.e.,
quality_control
field in the job configuration file isTRUE
), this rule uses the filtered SGE matrix and its meta file for coordinates from RulesdgeAR_polygonfilter
. Otherwise, it uses the raw SGE matrix created by RulesdgeAR_reformat
and its meta file for coordinates from RulesdgeAR_minmax
. -
(Optional) A Strict Boundary GEOJSON File When segmenting a filtered SGE matrix, the strict boundary GEOJSON file from Rule
sdgeAR_polygonfilter
will be applied.
Output Files¶
The rule generates the following output in the specified directory path:
1 |
|
<sge_qc>
stands for whether gene-filtering and polygon-filtering have been applied to the SGE matrix. For filtered SGE, <sge_qc>
is set to filtered
. Otherwise, <sge_qc>
is raw
.
* <hexagon_width>
represents the hexagon size.
(1) hexagon-indexed SGE¶
Description: This output consists of an SGE formatted by segmenting pixels into hexagonal units. The size of the hexagons is defined by the user. This SGE is in TSV format compatible to FICTURE.
File Naming Convention:
1 |
|
File Format:
1 2 3 4 |
|
random_index
: Hexagon IDs.X
: X-coordinates.Y
: Y-coordinates.gene
: Gene names.gn
: The number of UMI counts for Gene per hexagon.gt
: The number of UMI counts for GeneFull per hexagon.spl
: The number of UMI counts for Spliced per hexagon.unspl
: The number of UMI counts for Unspliced per hexagon.ambig
: The number of UMI counts for Ambiguous per hexagon.
Output Guidelines¶
The output file can serve as input for Latent Dirichlet Allocation in FICTURE.
Parameters¶
1 2 3 4 5 6 7 8 9 10 11 12 |
|
-
The
mu_scale
Parameter Specify the coordinate-to-micron translation for hexagons. By default, the spatial digital gene expression (SGE) matrix is considered to be in nanometers. -
The
segment
Field - The
hex_n_move
Parameter Specify the sliding steps. Whenhex_n_move
is set to 1, non-overlapping hexagon-indexed SGE will be created. - The
precision
Parameter Define the number of digits to store spatial location (in microns, 0 for integer). - The
ficture
Parameter- The
min_density
Parameter Set a minimum density of UMI counts when creating hexagon - The
char
Parameter Specify the characteristics for the hexagons, including the genomic feature to create hexagons (solo_feature
), the size of the hexagonal grid (hexagon_width
), and whether gene-filtering and polygon-filtering should be applied (quality_control
). This allows for multiple sets of parameters.
- The
Dependencies¶
When quality_control
is enabled, Rule sdgeAR_segment_ficture
can only be executed after the completion of Rule sdge2sdgeAR
and sdgeAR_polygonfilter
along with their prerequisite rules. Otherwise, Rule sdgeAR_segment_ficture
can only be executed after the completion of sdge2sdgeAR
, sdgeAR_polygonfilter
, sdgeAR_minmax
, and their prerequisite rules.
See an overview of the rule dependencies in the Workflow Structure.
Code Snippet¶
The code for this rule is provided in c04_sdgeAR_segment_ficture.smk
.