Conference Paper
Detecting SMS spam in the age of legitimate bulk messaging
Bradley Reaves, Logan Blue, Dave Tian, Patrick Traynor, and Kevin R. B. Butler
Proceedings of the ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2016
Shows legitimate bulk messages like verification codes collapse SMS spam filter recall to 23%, and releases the largest public SMS spam dataset to date.
Abstract
impact on the efficacy of SMS spam filtering. Because legitimate bulk messages have characteristics similar to spam, including the ubiquity of a number (like a short code or onetime password) or a URL, as well as a call to action (“click here”), we hypothesize that SMS spam filters will need to change to account for a new messaging paradigm. In this paper, we leverage a dataset of nearly 400,000 messages collected over the course of 14 months. We obtain such data by crawling public SMS gateways. Users rely on these public gateways to receive legitimate SMS verification messages as well as to avoid having their actual phone numbers exposed to lists that receive spam. We rely on this data to make the following contributions: • Release Largest Public Dataset: We release a labeled dataset of bulk messaging and SMS spam, which is larger than any previously published spam dataset by nearly an order of magnitude. • Weaknesses in Previous Datasets: We show that existing SMS spam/ham corpora do not sufficiently reflect the prevalence of bulk messages in modern SMS communications, preventing effective SMS spam detection. Specifically, we demonstrate that previously proposed mechanisms trained on such datasets exhibit extremely poor results (e.g., 23% recall) in the presence of such messages. • Characterization of SMS Spam Campaign: We provide deeper insight into ongoing SMS spam campaigns, including both topic and network analysis. We find that the number of messages sent in a campaign is best explained by the volume of sending numbers available to the campaign.
Citation (IEEE)
B. Reaves, L. Blue, D. Tian, P. Traynor, and K. R. B. Butler, “Detecting SMS spam in the age of legitimate bulk messaging,” in Proceedings of the ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2016.
BibTeX
@inproceedings{rbt+16,
author = {{Bradley Reaves} and {Logan Blue} and {Dave Tian} and {Patrick Traynor} and {Kevin R. B. Butler}},
booktitle = {{Proceedings of the ACM Conference on Security and Privacy in Wireless and Mobile Networks}},
date = {2016-07},
keywords = {short},
title = {Detecting {SMS} spam in the age of legitimate bulk messaging},
}