1, How was FunRich database created?

For protein domains, the SMART1 database was used for the entire human proteome; for the gene ontology annotations including biological process, cellular component and molecular function Gene Ontology database, HPRD2, Entrez Gene3 and UniProt 4, were used. For protein-protein interactions, BioGRID5, Intact6, Human Proteinpedia7 and HPRD datasets were downloaded and mapped to Entrez Gene or UniProt accession identifiers. The respective datasets were parsed by customized Perl scripts.

Sites of expression (cell lines, normal and disease tissues) is collected from HPRD, UniProt, Human Protein Atlas8, Human Proteome Browser9, Human Proteome Map10, ProteomicsDB11 and Human Proteinpedia7 databases. Protein annotations pertaining to pathways have been collected from Reactome12, NCI13, Cell map14, HumanCyc15 and NCI13 nature databases. Data for transcription factors were collected from 29 mammalian genome projects16 while for disease terms clinical synopsis phenotypic terms were downloaded from OMIM database17. In addition, human proteome semi-quantitative data set compiled by Human Proteome Map and ProteomicDB were also collated. Similarly, mass spectrometry (PRIDE18, PeptideAtlas19, Peptidome20 and Human Proteinpedia), immunohistochemistry (HPA and Human Proteinpedia), exosome (ExoCarta21, Vesiclepedia22), colorectal cancer (Colorectal Cancer Database), plasma (Plasma Proteome Database23), post-translational modification (PhosphositePlus24, HPRD, Human Proteinpedia and UniProt) databases were used to download the protein annotations and parsed with customized Perl scripts. As no two databases had the download files in same format or same accession identifiers, Perl scripts were customized to every single database file.

References


1. Letunic, I., Doerks, T. & Bork, P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 40, D302-305 (2012).

2. Keshava Prasad, T.S. et al. Human Protein Reference Database--2009 update. Nucleic Acids Res 37, D767-772 (2009).

3. Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 35, D26-31 (2007).

4. UniProt-Consortium The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38, D142-148 (2010).

5. Stark, C. et al. The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 39, D698-704 (2011).

6. Aranda, B. et al. The IntAct molecular interaction database in 2010. Nucleic Acids Res 38, D525-531 (2010).

7. Mathivanan, S. et al. Human Proteinpedia enables sharing of human protein data. Nat. Biotechnol. 26, 164-167 (2008).

8. Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248-1250 (2010).

9. Mathivanan, S. Integrated bioinformatics analysis of the publicly available protein data shows evidence for 96% of the human proteome. J Proteomics Bioinform 7, 41-49 (2014).

10. Kim, M.S. et al. A draft map of the human proteome. Nature 509, 575-581 (2014).

11. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582-587 (2014).

12. Croft, D. et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 42, D472-477 (2014).

13. Schaefer, C.F. et al. PID: the Pathway Interaction Database. Nucleic Acids Res. 37, D674-679 (2009).

14. Cerami, E.G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685-690 (2011).

15. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 42, D459-471 (2014).

16. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476-482 (2011).

17. Amberger, J., Bocchini, C. & Hamosh, A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM(R)). Hum. Mutat. 32, 564-567 (2011).

18. Vizcaino, J.A. et al. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics 9, 4276-4283 (2009).

19. Deutsch, E.W. The PeptideAtlas Project. Methods Mol. Biol. 604, 285-296 (2010).

20. Slotta, D.J., Barrett, T. & Edgar, R. NCBI Peptidome: a new public repository for mass spectrometry peptide identifications. Nat. Biotechnol. 27, 600-601 (2009).

21. Simpson, R.J., Kalra, H. & Mathivanan, S. ExoCarta as a resource for exosomal research. J Extracell Vesicles 1, 18374 (2012).

22. Kalra, H. et al. Vesiclepedia: a compendium for extracellular vesicles with continuous community annotation. PLoS Biol. 10, e1001450 (2012).

23. Muthusamy, B. et al. Plasma Proteome Database as a resource for proteomics research. Proteomics 5, 3531-3536 (2005).

24. Hornbeck, P.V., Chabra, I., Kornhauser, J.M., Skrzypek, E. & Zhang, B. PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 4, 1551-1561 (2004).



2, What statistical methods are used in FunRich?

For statistics, we have used hypergeometric test, BH and Bonferroni in FunRich.