Playscript classification and automatic wikipedia play articles generation

Siddhartha Banerjee, Cornelia Caragea, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with 'bag-of-words' (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.

Original languageEnglish
Title of host publicationProceedings - International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3630-3635
Number of pages6
ISBN (Print)9781479952083
DOIs
Publication statusPublished - 4 Dec 2014
Externally publishedYes
Event22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm
Duration: 24 Aug 201428 Aug 2014

Other

Other22nd International Conference on Pattern Recognition, ICPR 2014
CityStockholm
Period24/8/1428/8/14

Fingerprint

Classifiers
Experiments

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Banerjee, S., Caragea, C., & Mitra, P. (2014). Playscript classification and automatic wikipedia play articles generation. In Proceedings - International Conference on Pattern Recognition (pp. 3630-3635). [6977336] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2014.624

Playscript classification and automatic wikipedia play articles generation. / Banerjee, Siddhartha; Caragea, Cornelia; Mitra, Prasenjit.

Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. p. 3630-3635 6977336.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Banerjee, S, Caragea, C & Mitra, P 2014, Playscript classification and automatic wikipedia play articles generation. in Proceedings - International Conference on Pattern Recognition., 6977336, Institute of Electrical and Electronics Engineers Inc., pp. 3630-3635, 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, 24/8/14. https://doi.org/10.1109/ICPR.2014.624
Banerjee S, Caragea C, Mitra P. Playscript classification and automatic wikipedia play articles generation. In Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2014. p. 3630-3635. 6977336 https://doi.org/10.1109/ICPR.2014.624
Banerjee, Siddhartha ; Caragea, Cornelia ; Mitra, Prasenjit. / Playscript classification and automatic wikipedia play articles generation. Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 3630-3635
@inproceedings{a6a9ca2dfe774d0ab178577da189914c,
title = "Playscript classification and automatic wikipedia play articles generation",
abstract = "In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with 'bag-of-words' (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.",
author = "Siddhartha Banerjee and Cornelia Caragea and Prasenjit Mitra",
year = "2014",
month = "12",
day = "4",
doi = "10.1109/ICPR.2014.624",
language = "English",
isbn = "9781479952083",
pages = "3630--3635",
booktitle = "Proceedings - International Conference on Pattern Recognition",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Playscript classification and automatic wikipedia play articles generation

AU - Banerjee, Siddhartha

AU - Caragea, Cornelia

AU - Mitra, Prasenjit

PY - 2014/12/4

Y1 - 2014/12/4

N2 - In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with 'bag-of-words' (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.

AB - In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with 'bag-of-words' (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.

UR - http://www.scopus.com/inward/record.url?scp=84919933198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919933198&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2014.624

DO - 10.1109/ICPR.2014.624

M3 - Conference contribution

SN - 9781479952083

SP - 3630

EP - 3635

BT - Proceedings - International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

ER -