Rule sdgeAR_reformat
:¶
Purpose¶
The sdgeAR_reformat
prepares the spatial digital gene expression matrix (SGE) in FICTURE-compatible format. This rule also offers gene-filtering function, when preparing the FICTURE-compatible SGE, depends on the user-defined parameters in the job configuration file.
Input Files¶
- Spatial Digital Gene Expression Matrix (SGE) and its Metadata File for Coordinates
Required input files include a SGE file and its meta file for X Y coordinates. Those files are required to be stored in the
sgeAR
subfolder in theanalysis
directory. This could be generated by Rulesdge2sdgeAR
or manually prepared by the users.
Output Files¶
The rule generates the following output in the specified directory path:
1 |
|
(1) FICTURE-compatible SGE¶
Description: A SGE in the FICTURE format is generated, which contains all informations including the barcode information, features information, and count for each genomic feature.
File Naming Convention:
1 |
|
File Format:
1 2 3 4 |
|
#lane
: lane IDtile
: tile IDX
: X-coordinateY
: Y-coordinategene_id
: Gene Ensemble IDgene
: Gene symbolgn
: the count per gene per barcode for Genegt
: the count per gene per barcode for GeneFullspl
: the count per gene per barcode for Splicedunspl
: the count per gene per barcode for Unsplicedambig
: the count per gene per barcode for Ambiguous
(2) Two Tab-delimited Feature Files¶
Description: This include a feature file (*.feature.tsv.gz
) that contains information for all features and another feature file (*.feature.clean.tsv.gz
) that contains information for features aftering the gene-filtering.
File Naming Convention:
1 2 |
|
File Format: Those two feature files share the same format:
1 2 3 4 |
|
gene_id
: Gene Ensemble IDgene
: Gene symbolgn
: the count per gene per barcode for Genegt
: the count per gene per barcode for GeneFullspl
: the count per gene per barcode for Splicedunspl
: the count per gene per barcode for Unsplicedambig
: the count per gene per barcode for Ambiguous
Output Guidelines¶
The output file could be used as the input for FICTURE.
Parameters¶
1 2 3 4 |
|
-
The
keep_gene_type
Parameter Specifies the types of genes to retain during gene filtering. -
The
rm_gene_regex
Parameter Defines the types of genes to be excluded during gene filtering.
Info
It is important to note that both parameters utilizes regular expressions.
Dependencies¶
Given sdgeAR_reformat
requires input from Rule sdge2sdgeAR
, Rule sdgeAR_reformat
can only execute after sdge2sdgeAR
and its prerequisite rules have successfully completed their operations. See an overview of the rule dependencies in the Workflow Structure.
Code Snippet¶
The code for this rule is provided in a07_sdgeAR_reformat.smk
.