# Model Input This page summarizes the expected **input formats** for TiRank and provides practical guidance for matching example scripts, the Snakemake workflow, and your own datasets. ## Supported analysis modes TiRank supports three primary modes (driven by the bulk phenotype definition): - **Cox (survival analysis)**: time + event indicators - **Classification**: binary or multi-class labels (commonly 0/1) - **Regression**: continuous phenotype score Inference data can be **spatial transcriptomics (ST)** or **single-cell RNA-seq (SC)**. --- ## 1) Bulk RNA-seq expression matrix **Format**: CSV/TSV (recommended), readable by pandas. **Recommended orientation**: - **Rows = genes** - **Columns = samples** **Requirements**: - Gene identifiers should be consistent (e.g., HGNC gene symbols) and match across datasets where applicable. - Sample IDs must match those used in the bulk clinical table. **Example files (Zenodo example resources)**: - `GSE39582_exp_os.csv` (bulk expression) --- ## 2) Bulk clinical / phenotype table **Format**: CSV/TSV. **Requirements**: - A sample identifier column that matches the bulk expression column names. - Columns required depend on mode: ### Cox (survival) Minimum required columns: - `sample_id` - `time` (numeric; follow-up time) - `event` (0/1; 1 = event occurred) Example file: - `GSE39582_clinical_os.csv` ### Classification Minimum required columns: - `sample_id` - `label` (e.g., 0/1) Example files: - `Liu2019_meta.csv` (metadata / labels) ### Regression Minimum required columns: - `sample_id` - `score` (numeric phenotype) --- ## 3) Spatial transcriptomics (ST) input TiRank supports common ST data representations used in Python pipelines. ### A) Visium-style folder input A directory containing standard Visium outputs (e.g., matrix + spatial metadata). In the TiRank examples, the ST input is provided as a **folder**: - `SN048_A121573_Rep1/` (example ST folder) Example placement (recommended): - `data/ExampleData/CRC_ST_Prog/SN048_A121573_Rep1/` ### B) AnnData (optional) If you already have an `.h5ad` AnnData object for ST, you may adapt the example scripts accordingly. --- ## 4) Single-cell RNA-seq (SC) input **Format**: AnnData `.h5ad` (recommended). **Requirements** (typical): - Expression stored in `X` (cells × genes) - Cell-level metadata stored in `obs` (e.g., patient/sample identifiers and optional covariates) Example file: - `GSE120575.h5ad` Recommended placement: - `data/ExampleData/SKCM_SC_Res/GSE120575.h5ad` --- ## 5) Pretrained model files (if required by your workflow) Some workflows require pretrained files such as `ctranspath.pth`. Recommended placement for CLI/workflow: - `data/pretrainModel/ctranspath.pth` Recommended placement for Web GUI: - `Web/data/pretrainModel/ctranspath.pth` The example resources are hosted on Zenodo: - https://zenodo.org/records/18275554 --- ## 6) Example resources (recommended starting point) We provide example datasets and pretrained assets on Zenodo for reproducible testing: - https://zenodo.org/records/18275554 A recommended local structure: ``` TiRank/ ├── data/ │ ├── pretrainModel/ │ │ └── ctranspath.pth │ └── ExampleData/ │ ├── CRC_ST_Prog/ │ └── SKCM_SC_Res/ └── workflow/ ├── Snakefile └── config/config.yaml ``` If you use different locations, update the paths in the example scripts or in `workflow/config/config.yaml`.