tirank.TrainPre.Reject_With_GMM_Bio

tirank.TrainPre.Reject_With_GMM_Bio(pred_bulk, pred_sc, tolerance, min_components, max_components)[source]

Performs GMM-based rejection for Classification and Cox modes.

This function identifies phenotype-associated clusters by fitting a GMM to the bulk scores (to find target means 0 and 1) and another GMM to the sc/st scores, then finding sc/st clusters whose means align with the bulk targets within a given tolerance.

Parameters:
  • pred_bulk (np.ndarray) – Predicted scores from the bulk data (n_samples, 1).

  • pred_sc (np.ndarray) – Predicted scores from the sc/st data (n_cells, 1).

  • tolerance (float) – The maximum distance a sc/st cluster mean can be from a bulk target mean to be considered aligned.

  • min_components (int) – The minimum number of GMM components to try.

  • max_components (int) – The maximum number of GMM components to try.

Returns:

A binary mask (n_cells, 1) where 1 indicates a cell

to be rejected (phenotype-independent) and 0 indicates a cell to be kept.

Return type:

np.ndarray