Quick Start¶

Workflow¶

msmu processes LC-MS/MS search outputs and produces an analysis-ready protein matrix. Each processing step is modular, and normalization / filtering / aggregation can be applied optionally at any level depending on your analysis design

1. Load DB search result (read functions)
2. (optional) PSM-level filtering
3. Log2 Transformation
4. (optional) PSM normalization
5. Summarize to peptides
6. Protein inference
7. Summarize to protein groups
8. Analyze
9. Save

Functions can be called from submodules:

pp: preprocessing (filter, normalization, summarization, etc,)
tl: tools (PCA, UMAP, fasta annotation, DE analysis, etc,)
pl: plotting (bar plot for ID, charges, and histograms, etc,)

Basic usages of msmu can be found down below:

0. Import msmu¶

In [ ]:

Copied!

import msmu as mm
import msmu as mm

1. Load DB search result¶

Ingest outputs from DB search tools in to a unified MuData object.

In [ ]:

Copied!

mdata = mm.read_sage("sage/output/dir/", label="tmt")

mdata
mdata = mm.read_sage("sage/output/dir/", label="tmt")

mdata

2. (optional) PSM-level filtering¶

Remove low-confidence PSMs / precursors (q-value, etc.).

In [ ]:

Copied!

mdata = mm.pp.add_filter(mdata, modality="psm", column="q_value", keep="lt", value=0.01)
mdata = mm.pp.apply_filter(mdata, modality="psm")
mdata = mm.pp.add_filter(mdata, modality="psm", column="q_value", keep="lt", value=0.01)
mdata = mm.pp.apply_filter(mdata, modality="psm")

3. Log2 Transformation¶

Apply log2 transformation for quantification matrix
Further steps will be proceed with assumption of log2 transformed values.

In [ ]:

Copied!

mdata = mm.pp.log2_transform(mdata, modality="psm")

mdata["psm"].to_df().T
mdata = mm.pp.log2_transform(mdata, modality="psm")

mdata["psm"].to_df().T

4. (optional) PSM normalization¶

Apply observation (sample) wise normalization

In [ ]:

Copied!

mdata = mm.pp.normalize(mdata, modality="psm", method="median", rescale=True)
mdata = mm.pp.normalize(mdata, modality="psm", method="median", rescale=True)

5. Aggregate to peptides¶

Summarize PSMs (or precursors) to peptide level.
(optional) filtering or normalization can be also applied at peptide level.
Peptide-level q-values will be calculated based on their PEP.

In [ ]:

Copied!

mdata = mm.pp.to_peptide(mdata, **summarization_args)
mdata = mm.pp.to_peptide(mdata, **summarization_args)

6. Protein inference¶

Map peptides to protein groups

In [ ]:

Copied!

mdata = mm.pp.infer_protein(mdata)
mdata = mm.pp.infer_protein(mdata)

7. Aggregate to protein groups¶

Generate protein group level matrix.
Only unique peptides will be used for protein summarization.
Protein group-level q-values will be calculated based on their PEP.

In [ ]:

Copied!

mdata = mm.pp.to_protein(mdata, **summarization_args)

mm.pl.plot_bar(mdata, modality="protein", )
mdata = mm.pp.to_protein(mdata, **summarization_args)

mm.pl.plot_bar(mdata, modality="protein", )

8. Analyse¶

Perform differential expression, PCA/UMAP, QC, missingness analysis, and other statistical workflows.

In [ ]:

Copied!





# PCA / UMAP
mdata = mm.tl.pca(mdata, modality="protein") # mdata = mm.tl.umap(mdata, modality="protein")
mm.pl.plot_pca(mdata, modality="protein")    # mm.pl.plot_umap(mdata, modality="protein")

# DEA
de_res = mm.tl.run_de(mdata, modality="protein", ctrl="control", expr="expr")
de_res.to_df()  # show result in pandas dataframe
de_res.plot_volcano()   # show result with volcanoplot
# PCA / UMAP
mdata = mm.tl.pca(mdata, modality="protein") # mdata = mm.tl.umap(mdata, modality="protein")
mm.pl.plot_pca(mdata, modality="protein")    # mm.pl.plot_umap(mdata, modality="protein")

# DEA
de_res = mm.tl.run_de(mdata, modality="protein", ctrl="control", expr="expr")
de_res.to_df()  # show result in pandas dataframe
de_res.plot_volcano()   # show result with volcanoplot

9. Save & Load h5mu¶

In [ ]:

Copied!

mdata.write_h5mu("file/name/to/save.h5mu")

mdata = mm.read_h5mu("file/name/mudata.h5mu)
mdata.write_h5mu("file/name/to/save.h5mu")

mdata = mm.read_h5mu("file/name/mudata.h5mu)