Following-up the scientific literature is a time-consuming task accentuated by the considerable increase of available data. Currently,Locus Specific Databases (LSDB) are mainly manually curated and play a crucial role in the interpretation of variants. However, the transition to Next Generation Sequencing requires the creation of thousands of them, which is difficult to do manually. Thus, text mining and automation of information extraction became a key element of the biocuration process. We developed BioKnExt, a Biological Knowledge Extractor. It combines different approaches to identify and automatically extract genes and associated variations from the literature. With a Marfan use case of 550 articles, we demonstrated that it has an excellent efficiency with an average F1 score of 0.934 to retrieve the 3,645 variations from the FBN1, MYH11, ACTA2, MYLK and SMAD3 genes. We believe that the BioKnExt system can efficiently assist humans in most parts of the biocuration process to update reference databases and extract variations from the literature.
BioKnExt is freely available for non-commercial users. Nevertheless it is not allowed to copy all or part of the database content without specific authorisation from us. If you are a commercial user please contact us to obtain a dedicated license.