Richly Annotated Full Text Corpus (CRAFT) is a manually
annotated corpus consisting of 67 full-text biomedical journal
articles. Each article is a member of the PubMed
Central Open Access Subset.
Annotation guidelines used during the construction of CRAFT:
The CRAFT annotations are licensed under the Creative Commons
Attribution 3.0 license (CC BY).
- November 22nd, 2016 -- Version 2.0 of the CRAFT corpus (addition of manually curated coreference annotations) has been released. Click on the
above link to download or
- October 19th, 2012 --
Version 1.0 of the CRAFT corpus has been released. Click on the
above link to download or
- Version 1.0 contains updated versions of the Gene
Ontology Biological Process and Molecular Function annotations and
minor modifications to other annotations.
- May 27th, 2012 -- Version
0.9 of the CRAFT
corpus has been released.
- Version 0.9 contains the complete CRAFT corpus, with one
exception: the Gene Ontology Biological Process and Molecular Function
annotations are undergoing a quality assurance review. Some of the GO
BP/MF annotations included in the v0.9 release will likely change as a
result of the Q/A review. When the review is complete, CRAFT v1.0 will
To reference the
CRAFT corpus, please cite one of:
- Bada, M., Eckert,
Garcia, K., Shipley, K., Sitnikov, D., Baumgartner Jr., W. A., Cohen,
Verspoor, K., Blake, J. A., and Hunter, L. E. Concept Annotation in the CRAFT Corpus.
BMC Bioinformatics. 2012
Jul 9;13:161. doi: 10.1186/1471-2105-13-161. [PubMed:22776079]
- Verspoor, K.*, Cohen, K.B.*, Lanfranchi, A., Warner, C.,
Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N.,
Baumgartner Jr., W.A., Bada, M., Palmer, M., Hunter L.E. A corpus
of full-text journal articles is a robust evaluation tool for revealing
differences in performance of biomedical natural language processing
tools. BMC Bioinformatics.
2012 Aug 17;13(1):207. [PubMed:22901054]
- K. Bretonnel Cohen, Arrick Lanfranchi, Miji Joo-young Choi; Michael Bada,
William A. Baumgartner Jr., Natalya Panteleyeva, Karin Verspoor,
Martha Palmer, Lawrence E. Hunter. Coreference annotation and resolution in the Colorado Richly Annotated Full
Text (CRAFT) corpus of biomedical journal articles.BMC Bioinformatics. 2016.
Accompanying the release of CRAFT is a software module
that integrates CRAFT with the Unstructured
Information Management Architecture (UIMA). The software module is
a Maven project. It includes a Collection Reader for the CRAFT corpus
as well as the annotations themselves (in the form of UIMA XMI).
LICENSE: The craft-code software module has been released
under the BSD
DOCUMENTATION: API 1.0
SOURCE CODE: Download craft-code-2.0
MAVEN COORDINATES (CCP type system):
craft collection reader using the ccp type system -->