CREST: Corpus of Recommendation Strength

CREST is a collection of clinical guidelines annotated with instances of recommendations, each labeled with their strength of importance as specified by their authors. As data is drawn from many disparate authors, a unified scheme for labelling importance is defined, together with a mapping for each guideline.

For a detailed description of the corpus, please see the paper by Read, Velldal, Cavazza & Georg presented at LREC 2016, A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations:


The data is available for download from the following link:

The archive `crest.tgz’ contains the following:

  • partitions.xml
    The assignment of guidelines used in experiments described in the paper above, with guideline identifiers assigned to either heldout or development partitions (together with development’s folds for cross-validation).
  • primary/
    HTML acquired from (named according to the guideline identifier).
  • schemes.xml
    The recommendation strength schemes used by individual guidelines, which also contain attributes mapping to the unified scheme described in the paper above.
  • xml/
    XML encoding of the recommendations section in guidelines, with explicit labels of importance removed from the text and instead indicated with XML attributes (named according to the guideline identifier).
    The same information as found on this page.


Please use the following citation when referencing the data:

  author = {Jonathon Read and Erik Velldal and 
            Marc Cavazza and Gersende Georg},
  title = {A Corpus of Clinical Practice Guidelines Annotated with the
           Importance of Recommendations},
  booktitle = {Proceedings of the Tenth International Conference on 
               Language Resources and Evaluation},
  pages = {1724--1731},
  year = {2016},
  address = {Portoro┼ż, Slovenia}