Publication Date



Phishing is a growing problem with serious financial and security consequences for those entrapped by it. This paper examines the evolution, over the past seven years, of schemes for classifying Links and URLs as Phishing related. The initial techniques relied upon the use of expertly hand-selected Link/URL features, plausible, but ad-hoc, heuristics, a reliance on the context of the embedding email or web page and often the use of black and white lists. Over time many of these techniques have lost much of their potency due to the increasing sophistication of the Phishers' methods. Consequently, more recent work places a greater reliance upon automating feature selection using text classification techniques. Our work involves building a URL classifier and using to test the various schemes and general feature classes to determine their efficacy and possibility for improvement. We make use of publicly available lists of black and white URLs for purposes of both training and testing the classifier. Our results indicate a simple approach is superior to an over-complicated separation of feature sets.

Document Type

Open Access Conference Paper

Access Rights

Open Access