Сравнение инструментов определения именованных сущностей на новостных статьях (Сергей Вычегжанин, ISPRASOPEN-2019)
Материал из 0x1.tv
- Сергей Вычегжанин
Named Entity Recognition in texts is an important natural language processing task. There are many systems to solve this problem. These systems differ in targeting domains, processing methodologies, supported languages and recognized entity types. The presence of a large number of aspects creates difficulties for the user when choosing the appropriate tool for solving a specific problem. The aim of this work is a comparative study of seven publicly available and well-known libraries that can elicit named entities: Stanford NER, spaCy, NLTK, Polyglot, Flair, GATE and DeepPavlov.
The talk consists of seven sections. The introduction lists the areas of application for the Named Entity Recognition task and the approaches used to solve it. The second section is devoted to a review of works in which comparative studies of existing tools are presented.
In the third section, the characteristics of the four text corpora that were used during the experiments are given. The fourth section contains a brief description of the tools selected for research. The fifth section describes the metrics used to evaluate tool performance. The sixth section presents the results of the experiments and their discussion. In conclusion the results of the work are summarized.
The results of the study show that for the English language close values of the F1-score for the problem of Named Entities Recognition have the Flair and DeepPavlov libraries. For the Russian language the first place is taken by the DeepPavlov library, significantly surpassing other tools in quality.
Примечания и ссылки