Cross-language domain adaptation for classifying crisis-related short messages

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

Original languageEnglish
Title of host publicationISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management
PublisherInformation Systems for Crisis Response and Management, ISCRAM
ISBN (Electronic)9788460879848
Publication statusPublished - 2016
Event13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016 - Rio de Janeiro, Brazil
Duration: 22 May 201625 May 2016

Other

Other13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016
CountryBrazil
CityRio de Janeiro
Period22/5/1625/5/16

Fingerprint

Disasters
Classifiers
Learning systems
Labels
Earthquakes
Language
Disaster
Classifier

Keywords

  • Domain adaptation
  • Social media
  • Tweets classification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Electrical and Electronic Engineering

Cite this

Imran, M., Mitra, P., & Srivastava, J. (2016). Cross-language domain adaptation for classifying crisis-related short messages. In ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management Information Systems for Crisis Response and Management, ISCRAM.

Cross-language domain adaptation for classifying crisis-related short messages. / Imran, Muhammad; Mitra, Prasenjit; Srivastava, Jaideep.

ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management. Information Systems for Crisis Response and Management, ISCRAM, 2016.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Imran, M, Mitra, P & Srivastava, J 2016, Cross-language domain adaptation for classifying crisis-related short messages. in ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management. Information Systems for Crisis Response and Management, ISCRAM, 13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016, Rio de Janeiro, Brazil, 22/5/16.
Imran M, Mitra P, Srivastava J. Cross-language domain adaptation for classifying crisis-related short messages. In ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management. Information Systems for Crisis Response and Management, ISCRAM. 2016
Imran, Muhammad ; Mitra, Prasenjit ; Srivastava, Jaideep. / Cross-language domain adaptation for classifying crisis-related short messages. ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management. Information Systems for Crisis Response and Management, ISCRAM, 2016.
@inproceedings{f613f74056384d41a544f190d7bd6d82,
title = "Cross-language domain adaptation for classifying crisis-related short messages",
abstract = "Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.",
keywords = "Domain adaptation, Social media, Tweets classification",
author = "Muhammad Imran and Prasenjit Mitra and Jaideep Srivastava",
year = "2016",
language = "English",
booktitle = "ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management",
publisher = "Information Systems for Crisis Response and Management, ISCRAM",

}

TY - GEN

T1 - Cross-language domain adaptation for classifying crisis-related short messages

AU - Imran, Muhammad

AU - Mitra, Prasenjit

AU - Srivastava, Jaideep

PY - 2016

Y1 - 2016

N2 - Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

AB - Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

KW - Domain adaptation

KW - Social media

KW - Tweets classification

UR - http://www.scopus.com/inward/record.url?scp=85015720266&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015720266&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85015720266

BT - ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management

PB - Information Systems for Crisis Response and Management, ISCRAM

ER -