Feature Filtering (Add-on)¶
Overview¶
CartLoader provides an add-on module, named feature_filtering, to filter features (e.g., genes) from a CSV/TSV using explicit lists, substrings, regex patterns, or a type reference file.
The curated feature file can then be used downstream, including FICTURE analysis.
Example Usage¶
1) Filter by Regex¶
This example demonstrates an exclude-only pattern with --exclude-feature-regex; conversely, you could keep only matching features using --include-feature-regex.
1 2 3 4 | |
2) Filter by Lists¶
CartLoader supports include-only and exclude-only list-based filtering; pick the one that fits your use case (or combine them). Below is an example using --include-feature-list.
1 2 3 4 5 6 7 8 | |
3) Filter by Substrings¶
Below, include and exclude are shown together for illustration; you may use either one independently.
1 2 3 4 5 | |
4) Filter by Feature Types¶
These examples show include-by-type filtering using a reference file; adapt the regex to the types you want to keep. Below provides two examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
Actions¶
Action Specifications
No action runs by default. Activate at least one using filtering criteria the filter parameters.
Include or exclude features by using explicit lists, substrings, regex patterns of feature name or types. Please note that:
- Include criteria are restrictive (a feature must satisfy all provided include constraints).
- Exclude criteria are subtractive (a feature matching any exclude constraint is removed).
Parameters¶
Input/Output Parameters¶
--in-csv(str, required): Input CSV/TSV of features (plain text or gzipped).--out-csv(str, required): Output filtered CSV/TSV; gzipped.--out-record(str): Optional TSV withfeatureandfilteringreason per feature.--csv-colname-feature-name(str): Feature column name in the input file (default:gene).--csv-delim(str): Field delimiter for the input (applies to output as well; default:\t).--chunksize(int): Chunk size for streaming reads (default: 50000).--log(flag): Write logs tofeature_filtering.lognext to outputs.
Filters Parameters¶
List Filters Parameters¶
--include-feature-list(str): Path to a file with feature names to include.--exclude-feature-list(str): Path to a file with feature names to exclude.
Substring Filters Parameters¶
--include-feature-substr(str): Include features containing the substring.--exclude-feature-substr(str): Exclude features containing the substring.
Regex Filters Parameters¶
--include-feature-regex(str): Include features matching the regex.--exclude-feature-regex(str): Exclude features matching the regex.
Type Filters Parameters¶
Reference file columns (name vs. index)
If the reference file has a header row, specify the feature-name and feature-type columns with --feature-type-ref-colname-*; otherwise, use the 0-based index flags --feature-type-ref-colidx-*.
--include-feature-type-regex(str): Include by feature type (e.g.,^protein_coding$).--feature-type-ref(str): Reference file with feature name and type columns.--feature-type-ref-delim(str): Delimiter for the reference file (default:\t).--feature-type-ref-colname-name(str): Column name for feature name.--feature-type-ref-colname-type(str): Column name for feature type.--feature-type-ref-colidx-name(int): 0-based column index for feature name.--feature-type-ref-colidx-type(int): 0-based column index for feature type.
Outputs¶
A CSV/TSV file contains the remaining features.
If --out-record is set, a TSV mapping each feature to a filtering reason; empty when a feature is kept.