Differential Expression (DE) Analysis
Overview
Differential Expression (DE) Analysis identifies proteins or peptides with significant abundance changes between experimental conditions. msmu provides permutation-based statistical testing to assess differential expression while controlling the false discovery rate (FDR).
mm.tl.run_de()
The run_de() function performs a non-parametric permutation test to evaluate differential expression between two groups. It calculates p-values based on the distribution of test statistics obtained from permuted group labels.
This function uses Welch's t-statistic by default, which is suitable for unequal variances between groups. Other statistics such as Student's t-statistics, Wilcoxon's W-statistics (rank-sum) test, and median difference are also available.
For multiple testing correction, msmu supports empirical FDR, and Benjamini-Hochberg method. Empirical FDR is recommended when using permutation tests.
n_resamples specifies the number of random permutations to generate the null distribution. If set to None, a simple hypothesis test without permutations is performed. The default of 1000 permutations provides a practical balance between statistical accuracy and computational cost.
If sample sizes are too small to meet n_resamples, all possible permutations are used to compute exact p-values (exact test).
Log2 fold-change (log2FC) between the two groups is calculated as the difference of log2-transformed median values.
p-value from the test is computed with the proportion of permuted statistics that are as extreme or more extreme than the observed statistic in null distribution with two-sided test.
q-value with empirical FDR is calculated by E[FDR] = pi0 * E[FP] / E[TP] referred to Yang Xie et al., Bioinformatics, 2011. and Storey et al., 2003.
See more details in the msmu.tl.run_de and usage examples in the tutorial DE Analysis.
de_res = mm.tl.run_de(
mdata,
modality="protein", # or "peptide"
category="condition", # column in .obs defining groups
ctrl="control", # control group label
expr="treated", # experimental group label
stat_method="welch", # options: "welch", "student", "wilcoxon", default "welch"
measure="median", # options: "mean", "median", default "median"
min_pct=0.5, # minimum fraction of non-missing values in at least one group, default 0.5
fdr="empirical", # options: "empirical", "bh", or False, default "empirical"
n_resamples=1000, # number of permutations, default 1000, if None, simple hypothesis test is performed
log_transformed=True # whether data is log-transformed, default True
)
de_res.to_df() # get results as pandas DataFrame
DE analysis results are stored in DeaResult object, which contains: Feature names, test statistics, log2 fold-changes, p-values, q-values, and other relevant information.
DE results can be accessed as a pandas DataFrame using the to_df() method.
Visualization of DEA Results
msmu provides visualization function to explore DEA results with volcano plots.
de_res.plot_volcano(
log2fc_cutoff=None, # (optional) log2 fold-change cutoff line, default None which shows fc_pct_5 line
pval_cutoff=0.05, # (optional) p-value cutoff line, default 0.05
label_top_n=5, # (optional) number of top significant features to label, default None (no labels)
)