Special issue on statistical learning of natural language structured input and output

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

During last decade, machine learning and, in particular, statistical approaches have become more and more important for research in Natural Language Processing (NLP) and Computational Linguistics. Nowadays, most stakeholders of the field use machine learning, as it can significantly enhance both system design and performance. However, machine learning requires careful parameter tuning and feature engineering for representing language phenomena. The latter becomes more complex when the system input/output data is structured, since the designer has both to (i) engineer features for representing structure and model interdependent layers of information, which is usually a non-trivial task; and (ii) generate a structured output using classifiers, which, in their original form, were developed only for classification or regression. Research in empirical NLP has been tackling this problem by constructing output structures as a combination of the predictions of independent local classifiers, eventually applying post-processing heuristics to correct incompatible outputs by enforcing global properties. More recently, some advances of the statistical learning theory, namely structured output spaces and kernel methods, have brought techniques for directly encoding dependencies between data items in a learning algorithm that performs global optimization. Within this framework, this special issue aims at studying, comparing, and reconciling the typical domain/task-specific NLP approaches to structured data with the most advanced machine learning methods. In particular, the selected papers analyze the use of diverse structured input/output approaches, ranging from re-ranking to joint constraint-based global models, for diverse natural language tasks, i.e., document ranking, syntactic parsing, sequence supertagging, and relation extraction between terms and entities. Overall, the experience with this special issue shows that, although a definitive unifying theory for encoding and generating structured information in NLP applications is still far from being shaped, some interesting and effective best practice can be defined to guide practitioners in modeling their own natural language application on complex data.

Original languageEnglish
Pages (from-to)147-153
Number of pages7
JournalNatural Language Engineering
Volume18
Issue number2
DOIs
Publication statusPublished - 1 Apr 2012
Externally publishedYes

Fingerprint

Learning systems
Processing
language
learning
Classifiers
Computational linguistics
ranking
Global optimization
Syntactics
computational linguistics
Learning algorithms
Tuning
learning method
Systems analysis
learning theory
Statistical Learning
Natural Language
Natural Language Processing
Machine Learning
Engineers

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Language and Linguistics
  • Linguistics and Language

Cite this

Special issue on statistical learning of natural language structured input and output. / Marques, Lluis; Moschitti, Alessandro.

In: Natural Language Engineering, Vol. 18, No. 2, 01.04.2012, p. 147-153.

Research output: Contribution to journalArticle

@article{b26ebd7db72843779e11aab6d30cbfc2,
title = "Special issue on statistical learning of natural language structured input and output",
abstract = "During last decade, machine learning and, in particular, statistical approaches have become more and more important for research in Natural Language Processing (NLP) and Computational Linguistics. Nowadays, most stakeholders of the field use machine learning, as it can significantly enhance both system design and performance. However, machine learning requires careful parameter tuning and feature engineering for representing language phenomena. The latter becomes more complex when the system input/output data is structured, since the designer has both to (i) engineer features for representing structure and model interdependent layers of information, which is usually a non-trivial task; and (ii) generate a structured output using classifiers, which, in their original form, were developed only for classification or regression. Research in empirical NLP has been tackling this problem by constructing output structures as a combination of the predictions of independent local classifiers, eventually applying post-processing heuristics to correct incompatible outputs by enforcing global properties. More recently, some advances of the statistical learning theory, namely structured output spaces and kernel methods, have brought techniques for directly encoding dependencies between data items in a learning algorithm that performs global optimization. Within this framework, this special issue aims at studying, comparing, and reconciling the typical domain/task-specific NLP approaches to structured data with the most advanced machine learning methods. In particular, the selected papers analyze the use of diverse structured input/output approaches, ranging from re-ranking to joint constraint-based global models, for diverse natural language tasks, i.e., document ranking, syntactic parsing, sequence supertagging, and relation extraction between terms and entities. Overall, the experience with this special issue shows that, although a definitive unifying theory for encoding and generating structured information in NLP applications is still far from being shaped, some interesting and effective best practice can be defined to guide practitioners in modeling their own natural language application on complex data.",
author = "Lluis Marques and Alessandro Moschitti",
year = "2012",
month = "4",
day = "1",
doi = "10.1017/S135132491200006X",
language = "English",
volume = "18",
pages = "147--153",
journal = "Natural Language Engineering",
issn = "1351-3249",
publisher = "Cambridge University Press",
number = "2",

}

TY - JOUR

T1 - Special issue on statistical learning of natural language structured input and output

AU - Marques, Lluis

AU - Moschitti, Alessandro

PY - 2012/4/1

Y1 - 2012/4/1

N2 - During last decade, machine learning and, in particular, statistical approaches have become more and more important for research in Natural Language Processing (NLP) and Computational Linguistics. Nowadays, most stakeholders of the field use machine learning, as it can significantly enhance both system design and performance. However, machine learning requires careful parameter tuning and feature engineering for representing language phenomena. The latter becomes more complex when the system input/output data is structured, since the designer has both to (i) engineer features for representing structure and model interdependent layers of information, which is usually a non-trivial task; and (ii) generate a structured output using classifiers, which, in their original form, were developed only for classification or regression. Research in empirical NLP has been tackling this problem by constructing output structures as a combination of the predictions of independent local classifiers, eventually applying post-processing heuristics to correct incompatible outputs by enforcing global properties. More recently, some advances of the statistical learning theory, namely structured output spaces and kernel methods, have brought techniques for directly encoding dependencies between data items in a learning algorithm that performs global optimization. Within this framework, this special issue aims at studying, comparing, and reconciling the typical domain/task-specific NLP approaches to structured data with the most advanced machine learning methods. In particular, the selected papers analyze the use of diverse structured input/output approaches, ranging from re-ranking to joint constraint-based global models, for diverse natural language tasks, i.e., document ranking, syntactic parsing, sequence supertagging, and relation extraction between terms and entities. Overall, the experience with this special issue shows that, although a definitive unifying theory for encoding and generating structured information in NLP applications is still far from being shaped, some interesting and effective best practice can be defined to guide practitioners in modeling their own natural language application on complex data.

AB - During last decade, machine learning and, in particular, statistical approaches have become more and more important for research in Natural Language Processing (NLP) and Computational Linguistics. Nowadays, most stakeholders of the field use machine learning, as it can significantly enhance both system design and performance. However, machine learning requires careful parameter tuning and feature engineering for representing language phenomena. The latter becomes more complex when the system input/output data is structured, since the designer has both to (i) engineer features for representing structure and model interdependent layers of information, which is usually a non-trivial task; and (ii) generate a structured output using classifiers, which, in their original form, were developed only for classification or regression. Research in empirical NLP has been tackling this problem by constructing output structures as a combination of the predictions of independent local classifiers, eventually applying post-processing heuristics to correct incompatible outputs by enforcing global properties. More recently, some advances of the statistical learning theory, namely structured output spaces and kernel methods, have brought techniques for directly encoding dependencies between data items in a learning algorithm that performs global optimization. Within this framework, this special issue aims at studying, comparing, and reconciling the typical domain/task-specific NLP approaches to structured data with the most advanced machine learning methods. In particular, the selected papers analyze the use of diverse structured input/output approaches, ranging from re-ranking to joint constraint-based global models, for diverse natural language tasks, i.e., document ranking, syntactic parsing, sequence supertagging, and relation extraction between terms and entities. Overall, the experience with this special issue shows that, although a definitive unifying theory for encoding and generating structured information in NLP applications is still far from being shaped, some interesting and effective best practice can be defined to guide practitioners in modeling their own natural language application on complex data.

UR - http://www.scopus.com/inward/record.url?scp=84858826439&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858826439&partnerID=8YFLogxK

U2 - 10.1017/S135132491200006X

DO - 10.1017/S135132491200006X

M3 - Article

VL - 18

SP - 147

EP - 153

JO - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

IS - 2

ER -