Skip to content

msmu.tl.pca

Perform Principal Component Analysis (PCA) on the specified modality of the MuData object.

References

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Andrzej M., Waldemar R. (1993). Principal Component Analysis (PCA). Computers & Geosciences, 19(3), 303-342.

Parameters:

Name Type Description Default
mdata MuData

MuData object containing the data.

required
modality str

The modality to perform PCA on.

required
layer str | None

Layer to use for quantification aggregation. If None, the default layer (.X) will be used. Defaults to "scaled".

None
n_components int | None

Number of components to keep. if n_components is not set all components are kept::

n_components == min(n_samples, n_features)

If n_components == 'mle' and svd_solver == 'full', Minka's MLE is used to guess the dimension. Use of n_components == 'mle' will interpret svd_solver == 'auto' as svd_solver == 'full'.

If 0 < n_components < 1 and svd_solver == 'full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

If svd_solver == 'arpack', the number of components must be strictly less than the minimum of n_features and n_samples.

Hence, the None case results in:

n_components == min(n_samples, n_features) - 1
None
svd_solver Literal['auto', 'full', 'arpack', 'randomized']

"auto": The solver is selected by a default 'auto' policy is based on X.shape and n_components: if the input data has fewer than 1000 features and more than 10 times as many samples, then the "covariance_eigh" solver is used. Otherwise, if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient "randomized" method is selected. Otherwise the exact "full" SVD is computed and optionally truncated afterwards.

"full" : Run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

"arpack" : Run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

"randomized" : Run randomized SVD by the method of Halko et al.

'auto'
random_state int | None

Used when the 'arpack' or 'randomized' solvers are used. Pass an int for reproducible results across multiple function calls.

0
key_added str

Base key used for PCA outputs. Results are stored in: - .obsm[key_added] for component scores - .varm[key_added] for loadings - .uns[key_added] for explained variance metadata Defaults to "X_pca".

'X_pca'
**kwargs Any

Additional keyword arguments passed to PCA constructor.

{}

Returns:

Type Description
MuData

Updated MuData object with PCA results.