Where can I find datasets for training and evaluating Named Entity Recognition models?

2023-08-30 / 资讯 / 118 阅读

  There are several sources where you can find datasets for training and evaluating Named Entity Recognition (NER) models. Some of the popular ones are:

  1. CoNLL-2003: The CoNLL-2003 NER dataset is widely used for evaluating NER models. It contains annotations for named entities in English and includes four types: person, organization, location, and miscellaneous. You can find this dataset on the CoNLL-2003 shared task website.

  2. OntoNotes: The OntoNotes dataset is a large multilingual corpus that includes NER annotations. It covers multiple languages and domains, making it suitable for cross-lingual and domain adaptation experiments. The dataset is*****ly available and can be obtained through the LDC (Linguistic Data Consortium).

  3. OpenNER: OpenNER is a collection of open-source NER datasets curated by the Zalando Research team. It includes datasets for multiple languages like English, German, Spanish, and Dutch. The datasets cover different domains, including news articles, biomedical literature, and legal documents.

  4. ACE: The Automatic Content Extraction (ACE) dataset is widely used for NER research. It includes annotations for multiple entity types, temporal expressions, relations, and events. However, obtaining this dataset may be challenging as it requires permission from the Linguistic Data Consortium.

  5. WikiNER: WikiNER is a NER dataset extracted from Wikipedia articles. It covers multiple languages and includes annotations for person, organization, location, and miscellaneous entity types. The dataset is*****ly available and can be downloaded from the WikiNER website.

  6. WebhoseIO: WebhoseIO is a commercial data provider that offers various datasets, including NER data. They provide pre-processed data extracted from the web, which can be useful for training NER models on large-scale real-world data. However, access to their data may require a subscription.

  It's important to note that some of these datasets may have specific usage restrictions, licensing conditions, or require permission from the data providers. Before using any dataset, make sure to review and comply with the terms of use to avoid any legal issues.

#免责声明#

  本站所展示的一切内容和信息资源等仅限于学习和研究目的,未经允许不得转载,不得将本站内容用于商业或者非法用途。
  本站信息均来自AI问答,版权争议与本站无关,所生成内容未经充分论证,本站已做充分告知,请勿作为科学参考依据,否则一切后果自行承担。如对内容有疑议,请及时与本站联系。