University of Tennessee*, Sungkyunkwan University†, University of Maryland‡
1 Kiho Lee and Kyungchan Lim contributed equally to this research.
Phishing attacks continue to be a major threat to internet users, causing data breaches, financial losses, and identity theft. This study provides an in-depth analysis of the lifespan and evolution of phishing websites, focusing on their survival strategies and evasion techniques. We analyze 286,237 unique phishing URLs over five months using a custom web crawler based on Puppeteer and Chromium. Our crawler runs on a 30-minute cycle, systematically checking the operational status of phishing websites by collecting their HTTP status codes, screenshots, HTML, and HTTP data. Temporal and survival analyses, along with statistical tests, are used to examine phishing website lifecycles, evolution, and evasion tactics. Our findings show that the average lifespan of phishing websites is 54 hours (2.25 days) with a median of 5.46 hours, indicating rapid takedown of many sites while a subset remains active longer. Interestingly, logistic-themed phishing websites (e.g., USPS) operate within a compressed timeframe (1.76 hours) compared to other brands (e.g., Facebook). We further analyze detection effectiveness using Google Safe Browsing (GSB). We find that GSB detects only 18.4% of phishing websites, taking an average of 4.5 days. Notably, 83.93% of phishing sites are already taken down before GSB detection, meaning GSB requires more prompt detection. Moreover, 16.07% of phishing sites persist beyond this point, surviving for an additional 7.2 days on average, resulting in an average total lifespan of approximately 12 days. We reveal that DNS resolution error is the main cause (67%) of phishing website takedowns. Finally, we uncover that phishing sites with extensive visual changes (more than 100 times) exhibit a median lifespan of 17 days, compared to 1.93 hours for those with minimal modifications. These results highlight the dynamic nature of phishing attacks, the challenges in detection and prevention, and the need for more rapid and comprehensive countermeasures against evolving phishing tactics.
Source code is publicly available in this GitHub repository.
Also, we share our phishing dataset that we have collected (e.g., index.html, screenshots) for 158 days (April, 2024 - September, 2024). Specifically, our Web crawler (implemented using Puppeteer and Chromium) visits each phishing websites and collects the landing page of each domain every 30 minutes. This dataset consists of 286,237 phishing URLs and a total of 2.7M visits. If you want to download the dataset, please contact us through this Google form.