Skip to content

Upload to AWS S3

Overview

Use upload_aws to publish CartLoader outputs (PMTiles, decoded spatial factors, and the catalog) to Amazon S3 for sharing or deployment. Supports single‑dataset uploads and collection uploads (multiple datasets via --in-list). File lists are taken from catalog.yaml and split into cartload basics (required outputs), cartload optional files (e.g., UMAP, alias), and additional basemaps (non-SGE PMTiles).


Action

Upload CartLoader outputs (including catalog.yaml) to a specified S3 path for a single dataset or a collection (--in-list).


Requirements

  • A completed run of run_cartload2 or run_cartload2_multi, which include:
    • Rasterized SGE tiles
    • (Optional) Decoded spatial factor maps
    • (Optional) Joined molecule-factor outputs
    • (Optional) Cell assets
    • (Optional) Background assets, such as histology
    • A catalog file (catalog.yaml) summarizing the output structure and metadata
  • AWS CLI installed and configured (e.g., aws configure).

Example Usage

1
2
3
4
5
6
7
AWS_DIR="s3://example/aws/prefix"   # S3 prefix that will contain the dataset
DATA_ID="EXAMPLE_ID"                # dataset identifier

cartloader upload_aws \
  --in-dir /path/to/cartload2/output \
  --s3-dir "${AWS_DIR}/${DATA_ID}" \
  --n-jobs 10
1
2
3
4
5
6
7
8
AWS_DIR="s3://example/aws/prefix"        # S3 prefix that will contain the collection
COLLECTION_ID="EXAMPLE_COLLECTION_ID"    # collection identifier
IN_LIST=/path/to/samples.tsv             # one sample ID per line; no header
cartloader upload_aws \
  --in-dir /path/to/parent_cartload_outputs \
  --s3-dir "${AWS_DIR}/${COLLECTION_ID}" \
  --in-list ${IN_LIST} \
  --n-jobs 10

Collection Structure

The examples below show input and S3 output layouts for a collection upload.

Input Collection

  • Generated by cartloader run_cartload2_multi.
  • --in-dir is a parent directory containing one subdirectory per sample.
  • Each subdirectory must include its own catalog.yaml.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/path/to/parent_cartload_outputs/   # defined by `--in-dir`
├── SAMPLE_001      # SAMPLE 1; sample ID appears in `--in-list`
│   ├── catalog.yaml
│   ├── ...         # SGE assets (e.g., genes_all.pmtiles)
│   ├── ...         # FICTURE assets (e.g., *-model.tsv, *-pseudobulk.tsv.gz, *-bulk-de.tsv, *-info.tsv, *.pmtiles, *rgb.tsv)
│   ├── ...         # Background assets (e.g., from `cartloader import_image`)
│   └── ...         # Cell assets (e.g., from `import_*_cell`)
├── SAMPLE_002      # SAMPLE 2
│   └── ...         
├── SAMPLE_003      # SAMPLE 3
│   └── ...   
└── SAMPLE_004      # SAMPLE 4
   └── ...   

Output S3 Collection

  • --s3-dir is the collection prefix (e.g., s3://bucket/collection-id).
  • Each dataset uploads to a subdirectory under that prefix: .../<s3_id>/.
  • Subdirectory naming: use catalog.yaml:id when available; otherwise lowercase the sample ID from --in-list and replace underscores with hyphens.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
/path/to/parent_s3_directory/   # defined by `--s3-dir` (often the collection ID)
├── sample-001      # SAMPLE 1; `<s3_id>` from catalog `id`, else lowercased sample ID with underscores → hyphens
│   ├── catalog.yaml
│   └── ...         # 
├── sample-002      # SAMPLE 2
│   └── ...         
├── sample-003      # SAMPLE 3
│   └── ...   
└── sample-004      # SAMPLE 4
   └── ...   

Parameters

Input/Output

  • --in-dir (str): Input dir (single) or parent dir with per-sample subdirs (collection).
  • --catalog-yaml (str): Path to catalog.yaml (single mode only; default: <in_dir>/catalog.yaml; not valid with --in-list).
  • --upload-basics-only (flag): Upload only cartload-generated basic files (skip optional and additional basemaps).
  • --upload-optional-only (flag): Upload only cartload-generated optional files, such as UMAP and alias (skip basics and basemaps).
  • --upload-basemap-only (flag): Upload only additional basemap PMTiles (skip basics and optionals).

Collection Parameters

  • --in-list (str): TSV of sample IDs (no header); enables collection mode. Use the same TSV you pass as --in-list to run_ficture2_multi and run_cartload2_multi.

AWS Configuration

  • --s3-dir (str): S3 destination (single) or parent prefix for per-sample outputs (collection).
  • --aws (str): aws CLI executable (default: aws).

Run Parameters

  • --dry-run (flag): Generate the Makefile; do not execute.
  • --restart (flag): Ignore existing outputs and rerun from scratch.
  • --n-jobs (int): Number of parallel jobs (default: 2).
  • --makefn (str): Name of the generated Makefile (default: upload_aws.mk, written inside --in-dir).