※ Data Statistics for EPSD, dbPAF and dbPPT

ContentEPSDdbPAFdbPPT
Known data
Species68720
Phosphorylation proteins209,32654,14831,012
Experimentally determined p-sites1,616,804483,00182,175
pS1,085,095318,01664,089
pT394,648108,61514,632
pY137,06156,3703,454
Data integration
Data size14.1 GB~150 MB~200 MB
Integrated informationBasic information, protein sequence, nucleotide sequence, phosphorylation regulator, genetic variation & mutation, functional annotation, structure, physicochemical property, functional domain, disease-associated information, protein-protein interaction, drug-target relation, orthologous information, biological pathway, transcriptional regulation, mRNA expression, protein expression / proteomics and subcellular localizationBasic information, protein sequenceBasic information, protein sequence

※ Usage


In EPSD database, We tried to make it more powerful and convenient to be used. The EPSD provide the browse option and search options.

1. Browse. You can browse the EPSD database by species.

Browse by species: You can select one of these species to browse all proteins with p-sites of corresponding species.

2. Search. Five search options are provided, including substrate search, peptide search, advanced search, batch search and BLAST search.

(1) Substrate Search could be used to input one or multiple keywords (separated by space character) to search the EPSD database. The search fields including EPSD ID, UniProt Accession, Protein Name, Protein Alias, Gene Name, Gene Alias and Species.

Example: Please click on the "Example" button to search "Serine/threonine-protein kinase PLK1" in Protein Name field. By clicking on the "Submit" button, the related proteins will be shown.

(2) Peptide Search could be used to input one phosphorylation peptide (with a character 'p' in front of p-site) to search the EPSD database.

Example: Please click on the "Example" button to search "KKpTLCGTPNYIAPEVLSK" in Any species. By clicking on the "Submit" button, the related phosphorylation peptide will be shown.

(3) Advanced Search could be used to input two or more terms to find the information more specifically. The querying fields can be empty if less terms are needed. The three terms could be connected by the following operators:

AND : the term following this operator has to be included in the specified field(s).
OR : either the preceding or the following term to this operator should occur in the specified field(s).
NOT : If selected, the term following this operator must be not contained in the specified field(s).

Example: You can click on the "Example" button to load an instance, which could search phosphoprotein "Serine/threonine-protein kinase PLK1" in Homo sapiens. The human Cellular tumor antigen p53 will be shown by clicking on the "Submit" button.

(4) Batch Search could be used to find a number of proteins such as a protein list. You can input keyword list, for example, a list of UniProt IDs to search the database. The list should be inputed as one keyword one line.

Example: You can click on the "Example" button to load three proteins UniProt accessions including P04637, P53350 and F8WH10. By clicking on the "Submit" button, you can find "Cellular tumor antigen p53", "Serine/threonine-protein kinase PLK1" and "Vascular endothelial growth factor A".

(5) BLAST Search could be used to find the specific protein and/or related homologues by sequence alignment. This search-option will help you to find the querying protein accurately and fast. Only one protein sequence in FASTA format is allowed per time. The E-value threshold could be user-defined. The default parameters of E-value is 0.01.

Example: You can click on the "Example" button to load the protein sequence of Homo sapiens Phosphoprotein PLK1. By clicking on the "Submit" button, you can find the related homologues.

3. Localization probability (LP) score

The localization probability (LP) score was computationally assigned to each potential p-site in phosphopeptides containing multiple serine, threonine or tyrosine residues, based on the cumulative binomial distribution (Olsen, et al., 2006). LP scores range from 0 to 1, and a higher LP score represents a higher cumulative binomial probability for a site to be a real p-site (Olsen, et al., 2006). In this regard, as previously described (Humphrey, et al., 2013), we classified the p-sites in original phosphopeptides into four categories, including class I (>0.75), class II (≤0.75 and >0.5), class III (≤0.5 and ≥0.25), and class IV (<0.25), based on the pre-calculated LP scores.