Skip to content

API

gopher.read_encyclopedia(proteins_txt)

Read results from EncyclopeDIA.

PARAMETER DESCRIPTION
proteins_txt

The EncyclopeDIA protein output.

TYPE: str

RETURNS DESCRIPTION
DataFrame

The EncyclopeDIA results in a format for gopher.

gopher.read_metamorpheus(proteins_txt)

Read results from Metamorpheus.

PARAMETER DESCRIPTION
proteins_txt

The Metamorpheus protein output file.

TYPE: str

RETURNS DESCRIPTION
DataFrame

The Metamorpheus results in a format for gopher.

gopher.read_diann(proteins_tsv)

Reads a DIANN-generated TSV file (pg_matrix).

Also processes it, and returns a cleaned Pandas DataFrame.

The function: - Extracts the first protein accession from the “Protein.Ids” column to use as the DataFrame index. - Renames the index axis to “Protein”. - Drops unnecessary metadata columns.

PARAMETER DESCRIPTION
proteins_tsv

Path to the DIANN-generated TSV file. Expected columns: ‘Protein.Group’, ‘Protein.Ids’, ‘Protein.Names’, ‘Genes’, ‘First.Protein.Description’,

TYPE: PathLike

RETURNS DESCRIPTION
pd.DataFrame: A DataFrame with the processed protein data, indexed by

the first protein accession. The returned DataFrame has the “Protein.Ids” column as the index and all columns are the MSR columns.

gopher.test_enrichment(proteins, desc=True, aspect='all', species='human', release='current', go_subset=None, contaminants_filter=None, fetch=False, progress=False, annotations=None, mapping=None, aggregate_terms=True)

Test for the enrichment of Gene Ontology terms from protein abundance.

The Mann-Whitney U Test is applied to each column of proteins dataframe and for each Gene Ontology (GO) term. The p-values are then corrected for multiple hypothesis testing across all of the columns using the Benjamini-Hochberg procedure.

PARAMETER DESCRIPTION
proteins

A dataframe where the indices are UniProt accessions and each column is an experiment to test. The values in this dataframe should be some measure of protein abundance: these could be the raw measurement if originating from a single sample or a fold-change/p-value if looking at the difference between two conditions.

TYPE: DataFrame

desc

Rank proteins in descending order?

TYPE: bool DEFAULT: True

aspect

The Gene Ontology aspect to use. Use “cc” for “Cellular Compartment”, “mf” for “Molecular Function”, “bp” for “Biological Process”, or “all” for all three.

TYPE: (str, {cc, mf, bp, all}) DEFAULT: 'all'

species

The species for which to retrieve GO annotations. If not “human” or “yeast”, see here.

TYPE: str, {"human", "yeast", ...}, optional. DEFAULT: 'human'

release

The Gene Ontology release version. Using “current” will look up the most current version.

TYPE: str DEFAULT: 'current'

go_subset

The go terms of interest. Should consists of the go term names such as ‘nucleus’ or ‘cytoplasm’.

DEFAULT: None

contaminants_filter

A list of uniprot accessions for common contaminants such as Keratin to filter out.

DEFAULT: None

fetch

Download the GO annotations even if they have been downloaded before?

TYPE: bool DEFAULT: False

progress

Show a progress bar during enrichment tests?

TYPE: bool DEFAULT: False

annotations

A custom annotations dataframe.

DEFAULT: None

mapping

A custom mapping of the GO term relationships.

DEFAULT: None

aggregate_terms

Aggregate the terms and do the tree search.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
DataFrame

The adjusted p-value for each tested GO term in each sample.

gopher.get_data_dir()

Retrieve the current data directory for ppx.

gopher.set_data_dir(path=None)

Set the ppx data directory.

PARAMETER DESCRIPTION
path

The path for ppx to use as its data directory.

TYPE: str or pathlib.Path object DEFAULT: None