tirank.Dataloader
- tirank.Dataloader.generate_val(savePath, validation_proportion=0.15, mode=None)[source]
Splits the bulk expression and clinical data into training and validation sets.
Loads the full bulk expression and clinical data, combines them, performs a random split, and saves the training and validation sets back to disk in the ‘2_preprocessing/split_data’ directory.
- Parameters:
savePath (str) – The main project directory path.
validation_proportion (float, optional) – The fraction of data to use for the validation set. Defaults to 0.15.
mode (str, optional) – The analysis mode (‘Cox’, ‘Classification’, ‘Regression’). This determines how many columns to use for the clinical data.
- Returns:
None
- class tirank.Dataloader.BulkDataset(*args: Any, **kwargs: Any)[source]
Bases:
DatasetPyTorch Dataset class for bulk RNA-seq (gene pair) data.
Handles different analysis modes by returning the appropriate clinical labels (e.g., time and event for Cox, a single label for Classification).
- Parameters:
df_Xa (pd.DataFrame) – DataFrame of gene pair features (samples x gene pairs).
df_cli (pd.DataFrame or pd.Series) – DataFrame/Series with clinical information.
mode (str, optional) – Analysis mode. One of ‘Cox’, ‘Classification’, or ‘Regression’. Defaults to ‘Cox’.
- class tirank.Dataloader.SCDataset(*args: Any, **kwargs: Any)[source]
Bases:
DatasetPyTorch Dataset class for single-cell RNA-seq (gene pair) data.
- Parameters:
df_Xb (pd.DataFrame or np.ndarray) – DataFrame of gene pair features (cells x gene pairs).
- class tirank.Dataloader.STDataset(*args: Any, **kwargs: Any)[source]
Bases:
DatasetPyTorch Dataset class for Spatial Transcriptomics (gene pair) data.
- Parameters:
df_Xc (pd.DataFrame or np.ndarray) – DataFrame of gene pair features (spots x gene pairs).
- tirank.Dataloader.PackData(savePath, mode, infer_mode, batch_size=1024)[source]
Loads all preprocessed data and packages it into PyTorch DataLoaders.
This function reads the training/validation gene pair matrices, clinical data, AnnData object, similarity matrix, and pathological labels from disk. It instantiates the Dataset classes (BulkDataset, STDataset, SCDataset) and wraps them in DataLoader objects. It also prepares the adjacency matrix (adj_A) and pathological labels (patholabels) for the model.
All resulting DataLoader objects and supporting data are saved to the ‘3_Analysis/data2train’ directory.
- Parameters:
savePath (str) – The main project directory path.
mode (str) – The analysis mode (‘Cox’, ‘Classification’, ‘Regression’).
infer_mode (str) – The inference data type (‘ST’ or ‘SC’).
batch_size (int, optional) – Batch size for the DataLoaders. Defaults to 1024.
- Returns:
None
- tirank.Dataloader.transform_test_exp(train_exp, test_exp)[source]
Transforms a test expression matrix into a gene pair matrix using pairs from a training set.
Given a gene pair matrix from training (columns are ‘GeneA__GeneB’) and a new expression matrix (genes as rows), this function computes the gene pair values for the new data, matching the pairs from training.
- Parameters:
train_exp (pd.DataFrame) – The gene pair matrix from the training set. Its columns define the gene pairs to be used.
test_exp (pd.DataFrame) – The raw gene expression matrix for the test set (genes as rows, samples as columns).
- Returns:
- A new gene pair matrix (samples x gene pairs) for the
test set, with the same columns as train_exp.
- Return type:
pd.DataFrame
Functions
Loads all preprocessed data and packages it into PyTorch DataLoaders. |
|
Splits the bulk expression and clinical data into training and validation sets. |
|
Transforms a test expression matrix into a gene pair matrix using pairs from a training set. |
Classes
PyTorch Dataset class for bulk RNA-seq (gene pair) data. |
|
PyTorch Dataset class for single-cell RNA-seq (gene pair) data. |
|
PyTorch Dataset class for Spatial Transcriptomics (gene pair) data. |