Protein Residue Corpora

This page lists a collection of corpora related to extraction and annotation of protein residues, both plain amino acid mentions and mutation sites, in text. The main sourceforge download page is http://sourceforge.net/projects/bionlp-corpora/files/ProteinResidue

MutationFinder Corpora:

MutationFinder-1.1-Corpus.tar.gz

http://mutationfinder.sourceforge.net

https://sourceforge.net/projects/mutationfinder/files/MutationFinder/MutationFinder-1.1/MutationFinder-1.1.tar.gz/download

Nagel Corpus:

NagelCorpus.tar.gz

Nagel K (2009) Automatic functional annotation of predicted active sites: combining PDB and literature mining. Cambridge, UK: University of Cambridge.

Protein Residue Full Text Corpus:

ProteinResidueFullTextCorpus.tar.gz

A set of annotations of amino acid residues and mutations over a full-text corpus. The PMIDs of the source texts are provided; the source text itself is not due to copyright restrictions.

Protein Residue Relations Silver Corpus:

ProteinResidueRelationsSilverCorpus.tar.gz

ProteinResidueRelationsSilverCorpus_A1.tar.gz

The package ending in "_A1" is in the A1 format of the BRAT Annotation tool (http://brat.nlplab.org/). Thanks to S.V. Ramanam of NPJoint http://npjoint.com/Cocoa_pre.html for producing this version.

Ravikumar K.E., Haibin, L., Cohn, JD, Wall, M.E., Verspoor, K.M. (2011) "Pattern Learning Through Distant Supervision for Extraction of Protein-Residue Associations in the Biomedical Literature". The Tenth International Conference on Machine Learning and Applications (ICMLA) 2011, Honolulu, Hawaii, USA, December, 2011.

Maintained by Helen L. Johnson.
This file last modified (none)