What’s in Phishers: A Longitudinal Study of Security Configurations in Phishing Websites and Kits

Kyungchan Lim^, Kiho Lee^, Fujiao Ji^, Yonghwi Kwon^†, Hyoungshick Kim^‡, Doowon Kim^
University of Tennessee^*, University of Maryland^†, Sungkyunkwan University^‡

Paper Sample Data

Abstract

Phishing attacks pose a significant threat to Internet users. Understanding the security posture of phishing infrastructure is crucial for developing effective defense strategies, as it helps identify potential weaknesses that attackers might exploit. Despite extensive research, there may still be a gap in fully understanding these security weaknesses. To address this important issue, this paper presents a longitudinal study of security configurations and vulnerabilities in phishing websites and associated kits. We focus on two main areas: (1) analyzing the security configurations of phishing websites and servers, particularly HTTP headers and application-level security, and (2) examining the prevalence and types of vulnerabilities in phishing kits. We analyze data from 906,731 distinct phishing websites collected over 2.5 years, covering HTML headers, client-side resources, and phishing kits. Our findings suggest that phishing websites often employ weak security configurations, with 88.8% of the 13,344 collected phishing kits containing at least one potential vulnerability, and 12.5% containing backdoor vulnerabilities. These vulnerabilities present an opportunity for defenders to shift from passive defense to active disruption of phishing operations. Our research proposes a new approach to leverage weaknesses in phishing infrastructure, allowing defenders to take proactive actions to disable phishing sites earlier and reduce their effectiveness.

Dataset and Source Code

Source code is publicly available in this GitHub repository.

Also, we share our phishing dataset that we have collected (e.g., index.html, screenshots) for 31 months (July, 2021 - January, 2024). Specifically, our Web crawler (implemented using Chromium) visits each phishing website and collects the landing page of each domain. This dataset consists of 906,731 phishing websites and a total of 16.7M URLs. If you want to download the dataset, please contact us through this Google form.

What’s in Phishers: A Longitudinal Study of Security Configurations in Phishing Websites and Kits

Kyungchan Lim*, Kiho Lee*, Fujiao Ji*, Yonghwi Kwon†, Hyoungshick Kim‡, Doowon Kim* University of Tennessee*, University of Maryland†, Sungkyunkwan University‡

Abstract

Dataset and Source Code

Kyungchan Lim^, Kiho Lee^, Fujiao Ji^, Yonghwi Kwon^†, Hyoungshick Kim^‡, Doowon Kim^
University of Tennessee^*, University of Maryland^†, Sungkyunkwan University^‡