Skip to content

Accessing Example Datasets

Dataset Overview

There are three example datasets published with the NovaScope protocol. Each input dataset contains two types of FASTQ files: (a) 1st-seq (single-end) FASTQ file that contains spatial barcodes to construct a barcode map, and (b) 2nd-seq (paired-end) FASTQ files that contains spatial barcodes in Read 1, and cDNA sequences in Read 2.

Minimal Test Run Dataset

This is a small (1.14GB) test run dataset comprising of a subset of the (shallow) liver section data described in Shallow Liver Section Dataset. This dataset is meant to be used to test the sanity of the pipeline, without necessarily offering biologically meaningful interpretation of data.

Shallow Liver Section Dataset

This dataset is a typical (23.7GB) example of Seq-Scope dataset that can be initially generated for a tissue section. Typically, the 2nd-seq FASTQ files contain 150-200M paired-end reads. This should be sufficient to examine the spatial distribution of the transcripts across the tissue, assess the quality of dataset, identify major cell types and marker genes, and perform basic pixel-level decoding of the spatial transcriptome. If the quality of the initial dataset look great, one may decide to sequence the library much more deeply to maximize the information content. (see Deep Liver Section Dataset for more details)

Deep Liver Section Dataset

If the initial examination of the shallow dataset looks promising, one can sequence the library much more deeply, to the level of saturating the library. A deeply sequenced dataset typically contains multiple pairs of FASTQ files, possibly across multiple sequencing runs. The deeply sequenced liver section dataset, available at https://doi.org/10.7302/tw62-4f97, has 7 pairs of FASTQ files (250GB) in addition to the shallow dataset.

Downloading the Datasets

Each of the three datasets have their own DOIs, which can be accessed using the URLs below.

1
2
3
4
5
6
7
## To download the tarball from Zenodo, you can use the following command
wget "https://zenodo.org/records/10835761/files/B08Csub_20240318_raw.tar.gz"

## uncompress the tarball using the following command:
mkdir B08Ctest
cd B08Ctest
tar xzvf ../B08Csub_20240318_raw.tar.gz
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## To download the tarball from Zenodo, you can use the following command

## create a directory to store the data
mkdir B08Cshallow
cd B08Cshallow

## download the 1st-seq FASTQ file
wget "https://zenodo.org/records/10840696/files/9203-AP.L3.B08C.R1_001.fastq.gz"

## download the 2nd-seq FASTQ files (R1 and R2)
wget "https://zenodo.org/records/10840696/files/9748-YK-3_CGAGGCTG_S3_R1_001.fastq.gz"
wget "https://zenodo.org/records/10840696/files/9748-YK-3_CGAGGCTG_S3_R2_001.fastq.gz"

## Additionally, you may want to download md5sum files 
## to verify the integrity of the downloaded files
  • Deep Liver Section Dataset (250GB) : Link to Deep Blue Data
    • Note that you need to use Globus to download the dataset.
    • Note that this dataset contains only additional 2nd-seq FASTQ files in addition to the Shallow Liver Section Dataset, so you need to download the shallow dataset first.