The Protein Design Group's protein-protein interaction corpus was originally created at the PDG in a idiosyncratic format. We refactored the corpus by formatting the data into industry-established formats WordFreak and Genia-style embedded-XML. The newly refactored corpus (PICorpus) can be used for a variety of biomedical language processing (BLP) tasks, including testing entity extraction, relation identification and relation extraction systems.
We are interested in your feedback about this corpus. Please direct all bug reports and comments about the contents of the corpus to the BioNLP-Corpora Bug Tracker. Be sure to choose the "PICorpus" from the dropdown options in the "Category" field.
If you are interested in helping with this effort, please send a message to the PICorpus help/discussion forum. Be sure to include "PICorpus" in the subject line.