Solving substitution ciphers for OCR with a semi-supervised hidden Markov model

Erik Scharwachter, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Document Analysis and Recognition, ICDAR
PublisherIEEE Computer Society
Pages11-15
Number of pages5
Volume2015-November
ISBN (Print)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Other

Other13th International Conference on Document Analysis and Recognition, ICDAR 2015
CountryFrance
CityNancy
Period23/8/1526/8/15

Fingerprint

Optical character recognition
Hidden Markov models
Substitution reactions
Processing

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Scharwachter, E., & Vogel, S. (2015). Solving substitution ciphers for OCR with a semi-supervised hidden Markov model. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (Vol. 2015-November, pp. 11-15). [7333716] IEEE Computer Society. https://doi.org/10.1109/ICDAR.2015.7333716

Solving substitution ciphers for OCR with a semi-supervised hidden Markov model. / Scharwachter, Erik; Vogel, Stephan.

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November IEEE Computer Society, 2015. p. 11-15 7333716.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Scharwachter, E & Vogel, S 2015, Solving substitution ciphers for OCR with a semi-supervised hidden Markov model. in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. vol. 2015-November, 7333716, IEEE Computer Society, pp. 11-15, 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Nancy, France, 23/8/15. https://doi.org/10.1109/ICDAR.2015.7333716
Scharwachter E, Vogel S. Solving substitution ciphers for OCR with a semi-supervised hidden Markov model. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November. IEEE Computer Society. 2015. p. 11-15. 7333716 https://doi.org/10.1109/ICDAR.2015.7333716
Scharwachter, Erik ; Vogel, Stephan. / Solving substitution ciphers for OCR with a semi-supervised hidden Markov model. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November IEEE Computer Society, 2015. pp. 11-15
@inproceedings{88600a9e26a9447680aa078c852d32d1,
title = "Solving substitution ciphers for OCR with a semi-supervised hidden Markov model",
abstract = "In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.",
author = "Erik Scharwachter and Stephan Vogel",
year = "2015",
month = "11",
day = "20",
doi = "10.1109/ICDAR.2015.7333716",
language = "English",
isbn = "9781479918058",
volume = "2015-November",
pages = "11--15",
booktitle = "Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Solving substitution ciphers for OCR with a semi-supervised hidden Markov model

AU - Scharwachter, Erik

AU - Vogel, Stephan

PY - 2015/11/20

Y1 - 2015/11/20

N2 - In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.

AB - In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.

UR - http://www.scopus.com/inward/record.url?scp=84962510118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962510118&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2015.7333716

DO - 10.1109/ICDAR.2015.7333716

M3 - Conference contribution

SN - 9781479918058

VL - 2015-November

SP - 11

EP - 15

BT - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

PB - IEEE Computer Society

ER -