opendvp.tl.impute_gaussian

opendvp.tl.impute_gaussian#

opendvp.tl.impute_gaussian(adata, mean_shift=-1.8, std_dev_shift=0.3, perSample=False, layer_key='unimputed', uns_key='impute_gaussian_qc_metrics')#

Impute missing values in an AnnData object using a Gaussian distribution.

This function imputes missing values in the data matrix using a Gaussian distribution, with the mean shifted and the standard deviation scaled. Imputation can be performed per protein (column) or per sample (row).

The original, un-imputed data matrix is stored in adata.layers. A DataFrame with quality control metrics for the imputation is stored in adata.uns. The QC metrics include the number of imputed values, the mean and standard deviation used for imputation, and a numpy array of the imputed values themselves for each feature.

Return type:

AnnData

Parameters#

adataad.AnnData

AnnData object with missing values to impute.

mean_shiftfloat, default -1.8

Number of standard deviations to shift the mean of the Gaussian distribution.

std_dev_shiftfloat, default 0.3

Factor to scale the standard deviation of the Gaussian distribution.

perSamplebool, default False

If True, impute per sample (row); if False, impute per protein (column).

layer_keystr, default ‘unimputed’

Key under which to store the original, un-imputed data matrix in adata.layers.

uns_keystr, default ‘impute_gaussian_qc_metrics’

Key under which to store the imputation QC metrics DataFrame in adata.uns.

Returns:#

ad.AnnData

AnnData object with imputed values in .X, the original matrix in .layers[layer_key], and QC metrics in .uns[uns_key].