Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Background Medical and life science research generates millions of
publications, and it is a great challenge for researchers to utilize this
information in full since its scale and complexity greatly surpasses human
reading capabilities. Automated text mining can help extract and connect
information spread across this large body of literature, but this technology is
not easily accessible to life scientists.
Methods and Results Here, we developed an easy-to-use end-to-end pipeline for
deep learning- and dictionary-based named entity recognition (NER) of typical
entities found in medical and life science research articles, including
diseases, cells, chemicals, genes/proteins, species and others. The pipeline
can access and process large medical research article collections (PubMed,
CORD-19) or raw text and incorporates a series of deep learning models
fine-tuned on the HUNER corpora collection. In addition, the pipeline can
perform dictionary-based NER related to COVID-19 and other medical topics.
Users can also load their own NER models and dictionaries to include additional
entities. The output consists of publication-ready ranked lists and graphs of
detected entities and files containing the annotated texts. In addition, we
provide two accessory scripts which allow processing of files in PubTator
format and rapid inspection of the results for specific entities of interest.
As model use cases, the pipeline was deployed on two collections of
autophagy-related abstracts from PubMed and on the CORD19 dataset, a collection
of 764 398 research article abstracts related to COVID-19.
Conclusions The NER pipeline we present is applicable in a variety of medical
research settings and makes customizable text mining accessible to life
scientists.
Authors (11)
Rafsan Ahmed
Petter Berntsson
Alexander Skafte
Salma Kazemi Rashed
Marcus Klang
Adam Barvesten
+5 more
Key Contributions
Develops EasyNER, an easy-to-use, end-to-end pipeline for deep learning- and dictionary-based Named Entity Recognition (NER) in medical and life science texts. It processes large collections like PubMed and CORD-19, fine-tuning models on the HUNER corpus, making advanced text mining accessible to researchers without deep NLP expertise.
Business Value
Accelerates scientific discovery by enabling researchers to quickly extract and connect critical information from vast biomedical literature, potentially leading to faster drug discovery and medical advancements.