tirank.Dataloader

tirank.Dataloader.generate_val(savePath, validation_proportion=0.15, mode=None)[source]

Splits the bulk expression and clinical data into training and validation sets.

Loads the full bulk expression and clinical data, combines them, performs a random split, and saves the training and validation sets back to disk in the ‘2_preprocessing/split_data’ directory.

Parameters:
  • savePath (str) – The main project directory path.

  • validation_proportion (float, optional) – The fraction of data to use for the validation set. Defaults to 0.15.

  • mode (str, optional) – The analysis mode (‘Cox’, ‘Classification’, ‘Regression’). This determines how many columns to use for the clinical data.

Returns:

None

class tirank.Dataloader.BulkDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

PyTorch Dataset class for bulk RNA-seq (gene pair) data.

Handles different analysis modes by returning the appropriate clinical labels (e.g., time and event for Cox, a single label for Classification).

Parameters:
  • df_Xa (pd.DataFrame) – DataFrame of gene pair features (samples x gene pairs).

  • df_cli (pd.DataFrame or pd.Series) – DataFrame/Series with clinical information.

  • mode (str, optional) – Analysis mode. One of ‘Cox’, ‘Classification’, or ‘Regression’. Defaults to ‘Cox’.

class tirank.Dataloader.SCDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

PyTorch Dataset class for single-cell RNA-seq (gene pair) data.

Parameters:

df_Xb (pd.DataFrame or np.ndarray) – DataFrame of gene pair features (cells x gene pairs).

class tirank.Dataloader.STDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

PyTorch Dataset class for Spatial Transcriptomics (gene pair) data.

Parameters:

df_Xc (pd.DataFrame or np.ndarray) – DataFrame of gene pair features (spots x gene pairs).

tirank.Dataloader.PackData(savePath, mode, infer_mode, batch_size=1024)[source]

Loads all preprocessed data and packages it into PyTorch DataLoaders.

This function reads the training/validation gene pair matrices, clinical data, AnnData object, similarity matrix, and pathological labels from disk. It instantiates the Dataset classes (BulkDataset, STDataset, SCDataset) and wraps them in DataLoader objects. It also prepares the adjacency matrix (adj_A) and pathological labels (patholabels) for the model.

All resulting DataLoader objects and supporting data are saved to the ‘3_Analysis/data2train’ directory.

Parameters:
  • savePath (str) – The main project directory path.

  • mode (str) – The analysis mode (‘Cox’, ‘Classification’, ‘Regression’).

  • infer_mode (str) – The inference data type (‘ST’ or ‘SC’).

  • batch_size (int, optional) – Batch size for the DataLoaders. Defaults to 1024.

Returns:

None

tirank.Dataloader.transform_test_exp(train_exp, test_exp)[source]

Transforms a test expression matrix into a gene pair matrix using pairs from a training set.

Given a gene pair matrix from training (columns are ‘GeneA__GeneB’) and a new expression matrix (genes as rows), this function computes the gene pair values for the new data, matching the pairs from training.

Parameters:
  • train_exp (pd.DataFrame) – The gene pair matrix from the training set. Its columns define the gene pairs to be used.

  • test_exp (pd.DataFrame) – The raw gene expression matrix for the test set (genes as rows, samples as columns).

Returns:

A new gene pair matrix (samples x gene pairs) for the

test set, with the same columns as train_exp.

Return type:

pd.DataFrame

Functions

tirank.Dataloader.PackData

Loads all preprocessed data and packages it into PyTorch DataLoaders.

tirank.Dataloader.generate_val

Splits the bulk expression and clinical data into training and validation sets.

tirank.Dataloader.transform_test_exp

Transforms a test expression matrix into a gene pair matrix using pairs from a training set.

Classes

tirank.Dataloader.BulkDataset

PyTorch Dataset class for bulk RNA-seq (gene pair) data.

tirank.Dataloader.SCDataset

PyTorch Dataset class for single-cell RNA-seq (gene pair) data.

tirank.Dataloader.STDataset

PyTorch Dataset class for Spatial Transcriptomics (gene pair) data.