SuperCAT: The (new and improved) corpus analysis toolkit

K. Bretonnel Cohen, William A. Baumgartner, Irina Temnikova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
PublisherEuropean Language Resources Association (ELRA)
Pages2784-2788
Number of pages5
ISBN (Electronic)9782951740891
Publication statusPublished - 1 Jan 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period23/5/1628/5/16

    Fingerprint

Keywords

  • Corpus
  • Representativeness
  • Sublanguage
  • Toolkit

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Bretonnel Cohen, K., Baumgartner, W. A., & Temnikova, I. (2016). SuperCAT: The (new and improved) corpus analysis toolkit. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 2784-2788). European Language Resources Association (ELRA).