Peregrine literature indexing service

About the resource

Type of resource: Text-mining software
License/availability: Open source
Website: Click here to view website
User documentation: Click here to view documentation
Relevant publication:


Peregrine is an indexing engine or tagger: a piece of software that can be used to recognize concepts in human readable text, based on a database (thesaurus) of known terms. It has recently been released as open source. Multi-word terms are correctly recognized. If terms can represent multiple concepts, Peregrine will attempt to disambiguate them. Peregrine was originally developed by Martijn Schuemie at the department of Medical Informatics of the Erasmus University Medical Center (EMC) in Rotterdam and has been improved and made into open source in collaboration with NBIC’s BioAssist Engineering Team. Peregrine’s source code is released under the AGPL license. A public Peregrine web service is available at This service uses Peregrine and the Intext-semantic package to recognize concepts in the text you can supply. An English language bio-medical ontology is pre-loaded. The data-mining project at the Netherlands Bioinformatics Center contains Peregrine and several supporting modules, e.g. for ontologies and datasets.