From data collection to analysis - Exploring regional linguistic variation in route directions by spatially-stratified web sampling

Sen Xu, Anuj Jaiswal, Xiao Zhang, Alexander Klippel, Prasenjit Mitra, Alan Maceachren

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

How spatial language varies regionally? This study investigates the possibility of exploring regional linguistic variations in spatial language by collecting and analyzing a Spatially-strAtified Route Direction Corpus (SARD Corpus) from volunteered spatial language text on the Web. Because of the fast content sharing functionality of the World Wide Web, it quickly becomes a hotbed for volunteered spatial language text, such as directions on hotels' Websites. These route directions can serve as a representation of everyday spatial language usage on the WWW. The spatial coverage and abundance of the data source is appealing while collecting and analyzing large quantities of spatially distributed data is still challenging. Through automated crawling, classifying and geo-referencing web documents containing route directions from the web, the SARD Corpus has been built covering the U.S., the U.K. and Australia. We implement a semantic categorical analysis scheme to explore regional variations in cardinal versus relative direction usages. Preliminary results show both similarity and differences at national level and geographic patterns at regional level. The design and implementation of building a geo-referenced large-scale corpus from Web documents offers a methodological contribution to corpus linguistics, spatial cognition, and the GISciences.

Original languageEnglish
Pages (from-to)49-52
Number of pages4
JournalCEUR Workshop Proceedings
Volume620
Publication statusPublished - 1 Dec 2010
EventWorkshop on Computational Models of Spatial Language Interpretation at Spatial Cognition 2010, COSLI 2010 - Portland, OR, United States
Duration: 15 Aug 201015 Aug 2010

    Fingerprint

Keywords

  • Cardinal directions
  • Geo-referenced web sampling
  • Regional linguistic variation
  • Spatial language analysis
  • Volunteered spatial information

ASJC Scopus subject areas

  • Computer Science(all)

Cite this