Automatic detection of pseudocodes in scholarly documents using machine learning

Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra's algorithm describes how to find the shortest paths between two nodes in a graph. Automatic identification and extraction of these algorithms from scholarly digital documents would enable automatic algorithm indexing, searching, analysis and discovery. An algorithm search engine, which identifies pseudocodes in scholarly documents and makes them searchable, has been implemented as a part of the CiteSeerX suite. Here, we illustrate the limitations of start-of-the-art rule based pseudocode detection approach, and present a novel set of machine learning based techniques that extend previous methods.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Pages738-742
Number of pages5
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event12th International Conference on Document Analysis and Recognition, ICDAR 2013 - Washington, DC, United States
Duration: 25 Aug 201328 Aug 2013

Other

Other12th International Conference on Document Analysis and Recognition, ICDAR 2013
CountryUnited States
CityWashington, DC
Period25/8/1328/8/13

Fingerprint

Learning systems
Search engines
Computer science

Keywords

  • Algorithm
  • Classification
  • Experiment
  • Machine Learning
  • Pseudocode

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Tuarob, S., Bhatia, S., Mitra, P., & Giles, C. L. (2013). Automatic detection of pseudocodes in scholarly documents using machine learning. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (pp. 738-742). [6628716] https://doi.org/10.1109/ICDAR.2013.151

Automatic detection of pseudocodes in scholarly documents using machine learning. / Tuarob, Suppawong; Bhatia, Sumit; Mitra, Prasenjit; Giles, C. Lee.

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2013. p. 738-742 6628716.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tuarob, S, Bhatia, S, Mitra, P & Giles, CL 2013, Automatic detection of pseudocodes in scholarly documents using machine learning. in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR., 6628716, pp. 738-742, 12th International Conference on Document Analysis and Recognition, ICDAR 2013, Washington, DC, United States, 25/8/13. https://doi.org/10.1109/ICDAR.2013.151
Tuarob S, Bhatia S, Mitra P, Giles CL. Automatic detection of pseudocodes in scholarly documents using machine learning. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2013. p. 738-742. 6628716 https://doi.org/10.1109/ICDAR.2013.151
Tuarob, Suppawong ; Bhatia, Sumit ; Mitra, Prasenjit ; Giles, C. Lee. / Automatic detection of pseudocodes in scholarly documents using machine learning. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2013. pp. 738-742
@inproceedings{5b828e80b7ef44bf9afdcb484baddabb,
title = "Automatic detection of pseudocodes in scholarly documents using machine learning",
abstract = "A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra's algorithm describes how to find the shortest paths between two nodes in a graph. Automatic identification and extraction of these algorithms from scholarly digital documents would enable automatic algorithm indexing, searching, analysis and discovery. An algorithm search engine, which identifies pseudocodes in scholarly documents and makes them searchable, has been implemented as a part of the CiteSeerX suite. Here, we illustrate the limitations of start-of-the-art rule based pseudocode detection approach, and present a novel set of machine learning based techniques that extend previous methods.",
keywords = "Algorithm, Classification, Experiment, Machine Learning, Pseudocode",
author = "Suppawong Tuarob and Sumit Bhatia and Prasenjit Mitra and Giles, {C. Lee}",
year = "2013",
doi = "10.1109/ICDAR.2013.151",
language = "English",
pages = "738--742",
booktitle = "Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",

}

TY - GEN

T1 - Automatic detection of pseudocodes in scholarly documents using machine learning

AU - Tuarob, Suppawong

AU - Bhatia, Sumit

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2013

Y1 - 2013

N2 - A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra's algorithm describes how to find the shortest paths between two nodes in a graph. Automatic identification and extraction of these algorithms from scholarly digital documents would enable automatic algorithm indexing, searching, analysis and discovery. An algorithm search engine, which identifies pseudocodes in scholarly documents and makes them searchable, has been implemented as a part of the CiteSeerX suite. Here, we illustrate the limitations of start-of-the-art rule based pseudocode detection approach, and present a novel set of machine learning based techniques that extend previous methods.

AB - A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra's algorithm describes how to find the shortest paths between two nodes in a graph. Automatic identification and extraction of these algorithms from scholarly digital documents would enable automatic algorithm indexing, searching, analysis and discovery. An algorithm search engine, which identifies pseudocodes in scholarly documents and makes them searchable, has been implemented as a part of the CiteSeerX suite. Here, we illustrate the limitations of start-of-the-art rule based pseudocode detection approach, and present a novel set of machine learning based techniques that extend previous methods.

KW - Algorithm

KW - Classification

KW - Experiment

KW - Machine Learning

KW - Pseudocode

UR - http://www.scopus.com/inward/record.url?scp=84889575897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889575897&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2013.151

DO - 10.1109/ICDAR.2013.151

M3 - Conference contribution

AN - SCOPUS:84889575897

SP - 738

EP - 742

BT - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

ER -