Block-suffix shifting

fast, simultaneous medical concept set identification in large medical record corpora.

Ying Liu, Lucian Vlad Lita, Radu Stefan Niculescu, Prasenjit Mitra, C. Lee Giles

Research output: Contribution to journalArticle

Abstract

Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical "sub-string" false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.

Original languageEnglish
Pages (from-to)424-428
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Publication statusPublished - 2008
Externally publishedYes

Fingerprint

Medical Records
Databases
Delivery of Health Care
Electronic Health Records
benzoylprop-ethyl

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Block-suffix shifting : fast, simultaneous medical concept set identification in large medical record corpora. / Liu, Ying; Lita, Lucian Vlad; Niculescu, Radu Stefan; Mitra, Prasenjit; Giles, C. Lee.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2008, p. 424-428.

Research output: Contribution to journalArticle

@article{ba6ec9d8cffc45199313b309400aa397,
title = "Block-suffix shifting: fast, simultaneous medical concept set identification in large medical record corpora.",
abstract = "Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical {"}sub-string{"} false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.",
author = "Ying Liu and Lita, {Lucian Vlad} and Niculescu, {Radu Stefan} and Prasenjit Mitra and Giles, {C. Lee}",
year = "2008",
language = "English",
pages = "424--428",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Block-suffix shifting

T2 - fast, simultaneous medical concept set identification in large medical record corpora.

AU - Liu, Ying

AU - Lita, Lucian Vlad

AU - Niculescu, Radu Stefan

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2008

Y1 - 2008

N2 - Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical "sub-string" false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.

AB - Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical "sub-string" false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.

UR - http://www.scopus.com/inward/record.url?scp=73949129806&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73949129806&partnerID=8YFLogxK

M3 - Article

SP - 424

EP - 428

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -