DDA - Label-Free (with FlashLFQ)¶
This tutorial involves how to analyze DDA LFQ data with combining DB search tools and FlashLFQ (for quantification).
For DDA label-free analysis, sometimes, we need to use stand-alone Label-free quantifcation tools such as FlashLFQ to quantify with more detailed options (e.g. MBR).
To utilize FlashLFQ, please find tutorials and installation guides in FlashLFQ documentation. This supports Docker, GUI, and conda environment.
import msmu as mm
import pandas as pd
Read Data and PSM Filtering¶
In this tutorial, we will use PXD012986 (Uszkoreit et al., 2022) dataset which is mentioned in DDA-LFQ tutorial section.
To combine FlashLFQ quantification result with msmu, we need to read PSM result file from DB search tools and filter PSMs based on q-value or other criteria, which is because FlashLFQ assumes that the input PSMs are already filtered.
base_dir = "https://raw.githubusercontent.com/bertis-informatics/msmu/refs/heads/main/data/sage_lfq"
sage_idents = f"{base_dir}/sage/results.sage.tsv"
meta = f"{base_dir}/meta.csv"
mdata = mm.read_sage(identification_file=sage_idents, label="label_free")
meta_df = pd.read_csv("https://raw.githubusercontent.com/bertis-informatics/msmu/refs/heads/main/data/sage_lfq/meta.csv")
meta_df = meta_df.set_index("sample_id")
mdata.obs = mdata.obs.join(meta_df)
mdata.push_obs()
mdata.obs
mdata = mm.pp.add_filter(mdata, modality="psm", column="q_value", keep="lt", value=0.01)
mdata = mm.pp.apply_filter(mdata, modality="psm")
mdata
INFO - Identification file loaded: (5000, 40) INFO - Decoy entries separated: (345, 15)
MuData object with n_obs × n_vars = 6 × 4336
obs: 'set', 'sample_name', 'condition', 'replicate'
uns: '_cmd'
1 modality
psm: 6 x 4336
obs: 'set', 'sample_name', 'condition', 'replicate'
var: 'proteins', 'peptide', 'stripped_peptide', 'filename', 'scan_num', 'charge', 'peptide_length', 'missed_cleavages', 'semi_enzymatic', 'contaminant', 'PEP', 'q_value', 'rt', 'calcmass'
uns: 'level', 'search_engine', 'quantification', 'label', 'acquisition', 'identification_file', 'quantification_file', 'decoy', 'filter', 'decoy_filter'
varm: 'search_result', 'filter'
Export FlashLFQ Input File¶
After filtering PSMs, we can export the PSMs to FlashLFQ input format using mm.io.write_flashlfq_input function.
mm.io.write_flashlfq_input(mdata, "flashlfq_input.tsv")
(optional in here) Run FlashLFQ¶
After exporting FlashLFQ input file, we can run FlashLFQ with proper parameters (e.g. MBR) to quantify peptides.
The command line example below shows how to run FlashLFQ in Linux. Please adjust the parameters based on your experimental design and FlashLFQ documentation.
You can skip this step in this tutorial and directly use the provided FlashLFQ quantification result file.
# bash
# dotnet CMD.dll --idt "flashlfq_input.tsv" --rep "/path/to/spectra/directory/" --ppm 5 --chg
# or using Docker
# docker run --rm -v /path/to/local/directory:/data smithchemwisc/flashlfq:1.0.3 \
# --idt "/data/flashlfq_input.tsv" \
# --rep "/data/spectra/" \
# --ppm 5 \
# --chg
Attach FlashLFQ result to mdata¶
Peptide quantification result from FlashLFQ can be attached to mdata using mm.utils.add_quant function with quant_tool="flashlfq" parameter with a file named "QuantifiedPeptides.tsv" containing peptide level quantification values and evidences.
flashlfq_dir = f"https://raw.githubusercontent.com/bertis-informatics/msmu/refs/heads/main/data/flashlfq"
flashlfq_peptides = f"{flashlfq_dir}/QuantifiedPeptides.tsv"
mdata = mm.pp.to_peptide(mdata)
mdata = mm.utils.add_quant(mdata, quant_data=flashlfq_peptides, quant_tool="flashlfq")
mdata = mm.pp.log2_transform(mdata, modality="peptide")
mdata
INFO - Peptide-level identifications: 3683 (3664 at 1% FDR)
Building new peptide quantification data.
INFO - Added quantification modality 'peptide' using flashlfq data. INFO - Quantification data shape: (3547, 6)
MuData object with n_obs × n_vars = 6 × 7883
obs: 'set', 'sample_name', 'condition', 'replicate'
uns: '_cmd'
2 modalities
psm: 6 x 4336
obs: 'set', 'sample_name', 'condition', 'replicate'
var: 'proteins', 'peptide', 'stripped_peptide', 'filename', 'scan_num', 'charge', 'peptide_length', 'missed_cleavages', 'semi_enzymatic', 'contaminant', 'PEP', 'q_value', 'rt', 'calcmass'
uns: 'level', 'search_engine', 'quantification', 'label', 'acquisition', 'identification_file', 'quantification_file', 'decoy', 'filter', 'decoy_filter'
varm: 'search_result', 'filter'
peptide: 6 x 3547
obs: 'set', 'sample_name', 'condition', 'replicate'
uns: 'level'