Protein Residue Corpora
This page lists a collection of corpora related to extraction and annotation of protein residues, both plain amino acid mentions and mutation sites, in text.
The main sourceforge download page is http://sourceforge.net/projects/bionlp-corpora/files/ProteinResidue
A set of 100 abstracts annotated by Kevin Nagel with protein,
residue, organism triples.
Protein Residue Full Text Corpus:
Nagel K (2009) Automatic functional annotation of predicted
active sites: combining PDB and literature mining. Cambridge,
UK: University of Cambridge.
A set of annotations of amino acid residues and mutations over
a full-text corpus. The PMIDs of the source texts are
provided; the source text itself is not due to copyright
Protein Residue Relations Silver Corpus:
This package includes annotations of protein-residue relations
in 1520 PubMed abstracts, as well as the source text. This
corpus is considered to be a "silver standard" corpus rather
than a gold standard as the annotations were automatically
generated and validated using physical information from the
Protein Data Bank.
The package ending in "_A1" is in the A1 format of the BRAT
Annotation tool (http://brat.nlplab.org/). Thanks to S.V. Ramanam
of NPJoint http://npjoint.com/Cocoa_pre.html for producing this
Ravikumar K.E., Haibin, L., Cohn, JD, Wall, M.E., Verspoor,
K.M. (2011) "Pattern Learning Through Distant Supervision for
Extraction of Protein-Residue Associations in the Biomedical
Literature". The Tenth International Conference on Machine
Learning and Applications (ICMLA) 2011, Honolulu, Hawaii, USA,
Maintained by Helen L. Johnson.
This file last modified Tuesday, 03-Jul-2012 00:51:45 UTC