SuperCAT: The (new and improved) corpus analysis toolkit

K. Bretonnel Cohen, William A. Baumgartner, Irina Temnikova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
PublisherEuropean Language Resources Association (ELRA)
Pages2784-2788
Number of pages5
ISBN (Electronic)9782951740891
Publication statusPublished - 1 Jan 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period23/5/1628/5/16

Fingerprint

engineering
Toolkit
Corpus Analysis
language
Closure
Language
Infinity
Finiteness

Keywords

  • Corpus
  • Representativeness
  • Sublanguage
  • Toolkit

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Bretonnel Cohen, K., Baumgartner, W. A., & Temnikova, I. (2016). SuperCAT: The (new and improved) corpus analysis toolkit. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 2784-2788). European Language Resources Association (ELRA).

SuperCAT : The (new and improved) corpus analysis toolkit. / Bretonnel Cohen, K.; Baumgartner, William A.; Temnikova, Irina.

Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. p. 2784-2788.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bretonnel Cohen, K, Baumgartner, WA & Temnikova, I 2016, SuperCAT: The (new and improved) corpus analysis toolkit. in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), pp. 2784-2788, 10th International Conference on Language Resources and Evaluation, LREC 2016, Portoroz, Slovenia, 23/5/16.
Bretonnel Cohen K, Baumgartner WA, Temnikova I. SuperCAT: The (new and improved) corpus analysis toolkit. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA). 2016. p. 2784-2788
Bretonnel Cohen, K. ; Baumgartner, William A. ; Temnikova, Irina. / SuperCAT : The (new and improved) corpus analysis toolkit. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. pp. 2784-2788
@inproceedings{7925b57e555645ffbf684c2c6b4e607d,
title = "SuperCAT: The (new and improved) corpus analysis toolkit",
abstract = "This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.",
keywords = "Corpus, Representativeness, Sublanguage, Toolkit",
author = "{Bretonnel Cohen}, K. and Baumgartner, {William A.} and Irina Temnikova",
year = "2016",
month = "1",
day = "1",
language = "English",
pages = "2784--2788",
booktitle = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - SuperCAT

T2 - The (new and improved) corpus analysis toolkit

AU - Bretonnel Cohen, K.

AU - Baumgartner, William A.

AU - Temnikova, Irina

PY - 2016/1/1

Y1 - 2016/1/1

N2 - This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

AB - This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

KW - Corpus

KW - Representativeness

KW - Sublanguage

KW - Toolkit

UR - http://www.scopus.com/inward/record.url?scp=85037111367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037111367&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85037111367

SP - 2784

EP - 2788

BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

PB - European Language Resources Association (ELRA)

ER -