A domain is only as good as its buddies

Detecting stealthy malicious domains via graph inference

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage. In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak “co-IP” relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy. Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.

Original languageEnglish
Title of host publicationCODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy
PublisherAssociation for Computing Machinery, Inc
Pages330-341
Number of pages12
Volume2018-January
ISBN (Electronic)9781450356329
DOIs
Publication statusPublished - 13 Mar 2018
Event8th ACM Conference on Data and Application Security and Privacy, CODASPY 2018 - Tempe, United States
Duration: 19 Mar 201821 Mar 2018

Other

Other8th ACM Conference on Data and Application Security and Privacy, CODASPY 2018
CountryUnited States
CityTempe
Period19/3/1821/3/18

Fingerprint

Scalability
Experiments

Keywords

  • Belief Propagation
  • DNS Data
  • Domain Association
  • Graph Inference
  • Malicious Domains

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Khalil, I., Guan, B., Nabeel, M., & Yu, T. (2018). A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. In CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (Vol. 2018-January, pp. 330-341). Association for Computing Machinery, Inc. https://doi.org/10.1145/3176258.3176329

A domain is only as good as its buddies : Detecting stealthy malicious domains via graph inference. / Khalil, Issa; Guan, Bei; Nabeel, Mohamed; Yu, Ting.

CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy. Vol. 2018-January Association for Computing Machinery, Inc, 2018. p. 330-341.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Khalil, I, Guan, B, Nabeel, M & Yu, T 2018, A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. in CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy. vol. 2018-January, Association for Computing Machinery, Inc, pp. 330-341, 8th ACM Conference on Data and Application Security and Privacy, CODASPY 2018, Tempe, United States, 19/3/18. https://doi.org/10.1145/3176258.3176329
Khalil I, Guan B, Nabeel M, Yu T. A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. In CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy. Vol. 2018-January. Association for Computing Machinery, Inc. 2018. p. 330-341 https://doi.org/10.1145/3176258.3176329
Khalil, Issa ; Guan, Bei ; Nabeel, Mohamed ; Yu, Ting. / A domain is only as good as its buddies : Detecting stealthy malicious domains via graph inference. CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy. Vol. 2018-January Association for Computing Machinery, Inc, 2018. pp. 330-341
@inproceedings{ef1757835ca54870ad0ee8ad8880a82d,
title = "A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference",
abstract = "Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage. In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak “co-IP” relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy. Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.",
keywords = "Belief Propagation, DNS Data, Domain Association, Graph Inference, Malicious Domains",
author = "Issa Khalil and Bei Guan and Mohamed Nabeel and Ting Yu",
year = "2018",
month = "3",
day = "13",
doi = "10.1145/3176258.3176329",
language = "English",
volume = "2018-January",
pages = "330--341",
booktitle = "CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - A domain is only as good as its buddies

T2 - Detecting stealthy malicious domains via graph inference

AU - Khalil, Issa

AU - Guan, Bei

AU - Nabeel, Mohamed

AU - Yu, Ting

PY - 2018/3/13

Y1 - 2018/3/13

N2 - Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage. In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak “co-IP” relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy. Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.

AB - Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage. In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak “co-IP” relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy. Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.

KW - Belief Propagation

KW - DNS Data

KW - Domain Association

KW - Graph Inference

KW - Malicious Domains

UR - http://www.scopus.com/inward/record.url?scp=85052018974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052018974&partnerID=8YFLogxK

U2 - 10.1145/3176258.3176329

DO - 10.1145/3176258.3176329

M3 - Conference contribution

VL - 2018-January

SP - 330

EP - 341

BT - CODASPY 2018 - Proceedings of the 8th ACM Conference on Data and Application Security and Privacy

PB - Association for Computing Machinery, Inc

ER -