API
gopher.read_encyclopedia(proteins_txt)
Read results from EncyclopeDIA.
| PARAMETER | DESCRIPTION |
|---|---|
proteins_txt
|
The EncyclopeDIA protein output.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
The EncyclopeDIA results in a format for gopher. |
gopher.read_metamorpheus(proteins_txt)
Read results from Metamorpheus.
| PARAMETER | DESCRIPTION |
|---|---|
proteins_txt
|
The Metamorpheus protein output file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
The Metamorpheus results in a format for gopher. |
gopher.read_diann(proteins_tsv)
Reads a DIANN-generated TSV file (pg_matrix).
Also processes it, and returns a cleaned Pandas DataFrame.
The function: - Extracts the first protein accession from the “Protein.Ids” column to use as the DataFrame index. - Renames the index axis to “Protein”. - Drops unnecessary metadata columns.
| PARAMETER | DESCRIPTION |
|---|---|
proteins_tsv
|
Path to the DIANN-generated TSV file.
Expected columns:
‘Protein.Group’,
‘Protein.Ids’,
‘Protein.Names’,
‘Genes’,
‘First.Protein.Description’,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
pd.DataFrame: A DataFrame with the processed protein data, indexed by
|
the first protein accession. The returned DataFrame has the “Protein.Ids” column as the index and all columns are the MSR columns. |
gopher.test_enrichment(proteins, desc=True, aspect='all', species='human', release='current', go_subset=None, contaminants_filter=None, fetch=False, progress=False, annotations=None, mapping=None, aggregate_terms=True)
Test for the enrichment of Gene Ontology terms from protein abundance.
The Mann-Whitney U Test is applied to each column of proteins dataframe and for each Gene Ontology (GO) term. The p-values are then corrected for multiple hypothesis testing across all of the columns using the Benjamini-Hochberg procedure.
| PARAMETER | DESCRIPTION |
|---|---|
proteins
|
A dataframe where the indices are UniProt accessions and each column is an experiment to test. The values in this dataframe should be some measure of protein abundance: these could be the raw measurement if originating from a single sample or a fold-change/p-value if looking at the difference between two conditions.
TYPE:
|
desc
|
Rank proteins in descending order?
TYPE:
|
aspect
|
The Gene Ontology aspect to use. Use “cc” for “Cellular Compartment”, “mf” for “Molecular Function”, “bp” for “Biological Process”, or “all” for all three.
TYPE:
|
species
|
The species for which to retrieve GO annotations. If not “human” or “yeast”, see here.
TYPE:
|
release
|
The Gene Ontology release version. Using “current” will look up the most current version.
TYPE:
|
go_subset
|
The go terms of interest. Should consists of the go term names such as ‘nucleus’ or ‘cytoplasm’.
DEFAULT:
|
contaminants_filter
|
A list of uniprot accessions for common contaminants such as Keratin to filter out.
DEFAULT:
|
fetch
|
Download the GO annotations even if they have been downloaded before?
TYPE:
|
progress
|
Show a progress bar during enrichment tests?
TYPE:
|
annotations
|
A custom annotations dataframe.
DEFAULT:
|
mapping
|
A custom mapping of the GO term relationships.
DEFAULT:
|
aggregate_terms
|
Aggregate the terms and do the tree search.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
The adjusted p-value for each tested GO term in each sample. |
gopher.get_data_dir()
Retrieve the current data directory for ppx.
gopher.set_data_dir(path=None)
Set the ppx data directory.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
The path for ppx to use as its data directory.
TYPE:
|