GraphNER is a named entity recognizer that uses graph propagation and improves BANNER and BANNER-ChemDNER systems.
BANNER and BANNER-ChemDNER are named entity recognition systems based on chain first order and second order conditional random fields(CRF). These systems formulate the named entity recognition task as a tagging task where each type of entity has a distinct beginning and inside marker and there is one tag to mark when a word does not belong to any named entity. For example if we are interested in genes, mutations, and diseases we can have B-GENE, I-GENE, B-MUTATION, I-MUTATION, B-DISEASE, I-DISEASE, and O. CRF is supervised and also ignores corpus level similarities between words. GraphNER improves upon a CRF-based system by using a graph that encodes these similarities. The output of CRF-based models are extracted in the form of posterior and transition probabilities. The posteriors get propagated on graph vertices so that similar vertices get similar distributions, and the updated label distributions are combined with the transition probabilities in a viterbi algorithm.
GraphNER works with the data format of Biocreative II shared task data, also supported by BANNER and BANNER-ChemDNER. Data of gene mention detection subtask in BioCreative II shared task is available for testing.
Experimental Releases
GraphNER 1.0 (Alpha release) (Dec 09, 2016)
This is not a final release. Experimental releases should only be used for testing and development. Do not use these on production sites, and make sure you have proper backups before installing.
GraphNER
For all platforms (386 MB)