Step 2b. Inferring Cell Type Factors using Seurat¶
This example illustrates inferring cell type factors using Seurat. This process contains two stops that require manual evaluations at step 2b.2 and step 2b.4.
Prefix:
This documentation uses the following prefixes to illustrate input and output filenames, which are automatically assigned by the script.
1 |
|
- The
nfactor
will be determined in step 2b.5. - Variable details for the prefixes are in the Job Configuration.
Step 2b.1 Data Evaluation¶
This step applies the Seurat_analysis.R
script in test mode to
- assess and remove mitochondrial and hypothetical genes,
- examine the distribution of the number of spatial barcodes per hexagon (nCount_RNA) and the number of genes detected per hexagon (
Nfeature_RNA
), - evaluate assess the performance of different nFeature_RNA thresholds — 50, 100, 200, 300, 400, 500, 750, and 1000 — to filter the input hexagon-indexed SGE.
Input & Output
1 2 3 4 5 6 7 8 9 |
|
Commands:
1 |
|
Step 2b.2 Manually Selecting the Optimal nFeature_RNA Cutoff¶
Examine density plots generated by step 2b.1 to choose the optimal nFeature_RNA
cutoff to remove noise signals. For our example data, we applied a cut off of 500 for the deep liver section dataset, and 100 for the shallow liver section dataset and the minimal test dataset. It is optional to define x y ranges(X_min
, X_max
, Y_min
, and Y_max
).
Define the following variables to the input configuration file.
Example:
1 2 3 4 5 |
|
Step 2b.3 Seurat Clustering Analysis¶
This step starts with removing mitochondrial and hypothetical genes and filtering the hexagon-indexed SGE matrix by nFeature_RNA_cutoff
. When X Y coordinate ranges are applied, the hexagon-indexed SGE matrix will also be filtered by coordinates.
Subsequently, this step applies sctransform normalization followed by dimensionality reduction through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) embedding.
Next, the step employs FindClusters
to segregate hexagons into clusters utilizing a shared nearest neighbor (SNN) modularity optimization-based clustering algorithm. During this process, FindClusters
applies an argument of resolution
to determine the "granularity" of clusters, i.e., a higher resolution value yields more clusters. A range of resolutions
, including 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, and 2, are applied to explore the optimal resolution
. For each resolution, the step generates an UMAP for dimensionality reduction, a spatial plot to visualize the clusters and their spatial arrangement, and a CSV file of differentially expressed genes for each cluster.
Additionally, this step generates a metadata file containing information on the cluster assignment for each hexagon, and an RDS (R Data Serialization) file that stores the complete Seurat object with all the compiled data.
Input & Output
1 2 3 4 5 6 7 8 9 10 |
|
Commands:
1 |
|
Step 2b.4 Manually Selecting the Optimal Clustering Resolution¶
Examine the UMAP and spatial plots from the step 2b.3 and choose the optimal resolution
. Then, save this chosen resolution
into the input configuration file as the res_of_interest
variable.
Example:
1 |
|
Step 2b.5 Preparing a Count Matrix¶
Transform the metadata file into a count matrix to serve as the model matrix for the subsequent steps. This step automatically detects the number of clusters from the model matrix and assigns it as a nfactor
variable in the input configuration file.
Input & Output
1 2 3 4 5 6 7 8 |
|
Commands:
1 |
|