Rule sdgeAR_polygonfilter
:¶
Purpose¶
The sdgeAR_polygonfilter
filters the transcript-indexed spatial digital gene expression (SGE) matrix by UMI density in polygons.
Input Files¶
-
A SGE in FICTURE-compatible Format A transcript-indexed SGE in the FICTURE format, which is generated by Rule
sdgeAR_reformat
. -
A Tab-delimited Clean Feature File Required the clean feature file from Rule
sdgeAR_featurefilter
. -
A Metadata File for X Y Coordinates A meta file for the minimum and maximum X Y coordinates to determine the major axis. This will be generated by Rule
dge2sdgeAR
or by the user manually.
Output Files¶
The rule generates the following output in the specified directory path:
1 |
|
(1) A Filtered SGE Matrix in FICTURE-compatible Format¶
Description: A filtered SGE matrix in FICTURE-compatible TSV format.
File Naming Convention:
1 |
|
<solo_feature>
: Genomic feature.
File Format:
1 2 3 4 |
|
#lane
: lane IDtile
: tile IDX
: X-coordinateY
: Y-coordinategene_id
: Gene Ensemble IDgene
: Gene symbolgn
: the count per gene per barcode for Genegt
: the count per gene per barcode for GeneFullspl
: the count per gene per barcode for Splicedunspl
: the count per gene per barcode for Unsplicedambig
: the count per gene per barcode for Ambiguous
(2) Two Filtered Tab-delimited Feature Files¶
Description: Two feature files representing genes filtered by the strict boundary and by the lenient boundary, respectively.
File Naming Convention:
1 2 |
|
<solo_feature>
: Genomic feature.
File Format: Those two feature files share the same format:
1 2 3 4 |
|
gene_id
: Gene Ensemble IDgene
: Gene symbolgn
: the count per gene per barcode for Genegt
: the count per gene per barcode for GeneFullspl
: the count per gene per barcode for Splicedunspl
: the count per gene per barcode for Unsplicedambig
: the count per gene per barcode for Ambiguous
(3) Two Boundary JSON Files¶
Description: One strict boundary file and one lenient boundary file. Both are demonstrated by coordinates in JSON files.
File Naming Convention:
1 2 |
|
<solo_feature>
: Genomic feature.
File Format: See details for JSON files at: https://en.wikipedia.org/wiki/JSON.
(4) A Metadata File for X Y Coordinates¶
Description: This file contains the minimum and maximum X Y coordinates for the filtered SGE matrix.
File Naming Convention:
1 |
|
<solo_feature>
: Genomic feature.
File Format:
1 2 3 4 |
|
xmin
: The minimum x-coordinate in micrometers across all barcodes in the filtered SGE matrix.
- xmax
: The maximum x-coordinate in micrometers across all barcodes in the filtered SGE matrix.
- ymin
: The minimum y-coordinate in micrometers across all barcodes in the filtered SGE matrix.
- ymax
: The maximum y-coordinate in micrometers across all barcodes in the filtered SGE matrix.
Output Guidelines¶
The output file could be used as the input for FICTURE.
Parameters¶
1 2 3 4 5 6 |
|
-
The
radius
Parameter The radius refers to the circumradius (the radius of the circumscribed circle around the polygon). The radius will be used to calculate the polygon diameter as well as the polygon area. -
The
hex_n_move
Parameter Define n moves when collapse to polygon. Whenhex_n_move
is 1, non-overlapping polygons will be applied. Otherwise, use overlapping polygons. -
The
polygon_min_size
Parameter If provided, remove small and isolated polygons (squared um) -
The
quartile
Parameter Specify which quartiles of the data should be considered for polygon-filtering. Thequartile
will be used to define the strict density cutoff. Thequartile
have four options: 0, 1, 2, 3, which corresponds to minimal, first quartile, median, and third quartile.
Dependencies¶
Rule sdgeAR_polygonfilter
executes only after sdge2sdgeAR
, sdgeAR_reformat
, sdgeAR_featurefilter
, and their prerequisites are completed. See the Workflow Structure for dependencies.
Code Snippet¶
The code for this rule is provided in c03_sdgeAR_polygonfilter.smk
.