Feature-rich named entity recognition for bulgarian using conditional random fields

Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril Simov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F 1=89.4%, which is comparable to the state-of-the-art results for English.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
Pages113-117
Number of pages5
Publication statusPublished - 2009
Externally publishedYes
EventInternational Conference on Recent Advances in Natural Language Processing, RANLP-2009 - Borovets, Bulgaria
Duration: 14 Sep 200916 Sep 2009

Other

OtherInternational Conference on Recent Advances in Natural Language Processing, RANLP-2009
CountryBulgaria
CityBorovets
Period14/9/0916/9/09

Fingerprint

Syntactics

Keywords

  • Conditional random fields
  • Information extraction
  • Linear models
  • Machine learning
  • Morphology
  • Named entity recognition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Georgiev, G., Nakov, P., Ganchev, K., Osenova, P., & Simov, K. (2009). Feature-rich named entity recognition for bulgarian using conditional random fields. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 113-117)

Feature-rich named entity recognition for bulgarian using conditional random fields. / Georgiev, Georgi; Nakov, Preslav; Ganchev, Kuzman; Osenova, Petya; Simov, Kiril.

International Conference Recent Advances in Natural Language Processing, RANLP. 2009. p. 113-117.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Georgiev, G, Nakov, P, Ganchev, K, Osenova, P & Simov, K 2009, Feature-rich named entity recognition for bulgarian using conditional random fields. in International Conference Recent Advances in Natural Language Processing, RANLP. pp. 113-117, International Conference on Recent Advances in Natural Language Processing, RANLP-2009, Borovets, Bulgaria, 14/9/09.
Georgiev G, Nakov P, Ganchev K, Osenova P, Simov K. Feature-rich named entity recognition for bulgarian using conditional random fields. In International Conference Recent Advances in Natural Language Processing, RANLP. 2009. p. 113-117
Georgiev, Georgi ; Nakov, Preslav ; Ganchev, Kuzman ; Osenova, Petya ; Simov, Kiril. / Feature-rich named entity recognition for bulgarian using conditional random fields. International Conference Recent Advances in Natural Language Processing, RANLP. 2009. pp. 113-117
@inproceedings{6a7cc2bb4125426a8fd8787491cabc85,
title = "Feature-rich named entity recognition for bulgarian using conditional random fields",
abstract = "The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F 1=89.4{\%}, which is comparable to the state-of-the-art results for English.",
keywords = "Conditional random fields, Information extraction, Linear models, Machine learning, Morphology, Named entity recognition",
author = "Georgi Georgiev and Preslav Nakov and Kuzman Ganchev and Petya Osenova and Kiril Simov",
year = "2009",
language = "English",
pages = "113--117",
booktitle = "International Conference Recent Advances in Natural Language Processing, RANLP",

}

TY - GEN

T1 - Feature-rich named entity recognition for bulgarian using conditional random fields

AU - Georgiev, Georgi

AU - Nakov, Preslav

AU - Ganchev, Kuzman

AU - Osenova, Petya

AU - Simov, Kiril

PY - 2009

Y1 - 2009

N2 - The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F 1=89.4%, which is comparable to the state-of-the-art results for English.

AB - The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F 1=89.4%, which is comparable to the state-of-the-art results for English.

KW - Conditional random fields

KW - Information extraction

KW - Linear models

KW - Machine learning

KW - Morphology

KW - Named entity recognition

UR - http://www.scopus.com/inward/record.url?scp=84858321458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858321458&partnerID=8YFLogxK

M3 - Conference contribution

SP - 113

EP - 117

BT - International Conference Recent Advances in Natural Language Processing, RANLP

ER -