Prominent existing approaches for detection of phishing websites can be categorized as follows:
Detect and block the phishing web sites manually in time
Detecting phishing webpages manually is one of the common approaches. User needs to be aware of various kinds of phishing-attacks and prior knowledge is essential in identifying these webpages in real-time. Williams and Li (2017) proposed an architectural model that evaluates ACT-R cognitive behavior. This is carried out by analyzing the authenticity of webpages based on the HTTP padlock security indicator. Afroz and Greenstadt (2011) has come up with a technique called ‘PhishZoo’ which uses site profiling as well as profile matching in the detection process. This technique makes a list of all sensitive websites and this list will be used to compare the loaded website. This approach is mainly based on matching the content of the Legitimate webpage with the Non-Legitimate one.
Detection based on URL and content of websites
Detection methods based on URL uses various characteristics of the website URL to filter phishing websites. Ma et al. (2009) implements learning online along with methods to identify host-based and lexical properties of phishing website URLs.
Content-based detection compares the content of the website viewed by the user with the original one. Mao et al. (2017) have proposed a system that detects phishing by analyzing similarities in components in websites. This method uses URL tokens to improve prediction accuracy of illegitimate websites. In addition to that, it compares the CSS rules of the legitimate and non-legitimate websites to identify the phishing one. Futai et al. (2016) uses the Graph Mining technique to detect phishing webpages. This method detects those phishing websites that aren’t possible by the URL analysis technique. It also accounts for the repeated interaction between the website and the user. Therefore, by analyzing the statistics of repeated interaction between the website and the user, it generates the AD-URL graph which is used to detect the phishing website.
Block the phishing e-mails by various spam filter software
Email attacks are a major source leading user to phishing websites. Spam filters are great options to prevent spam email clicks. Spam filters ensure a wide majority of malicious spam emails detection and are not delivered to inboxes. Roy et al. (2013) has developed a technique that uses spam filters to detect spam emails. This uses the Naive Bayes Classifier model for the prediction. It classifies by analyzing the contents in legitimate and illegitimate mails. It has managed to have an accuracy of 85%. Pandey and Ravi (2013) has come up with a technique where they use the URL and the source code of the website to gather information on the dissimilarities. They perform a text analysis on the gathered information and finally make a prediction.
Hu et al. (2016) has proposed a technique that analyzes server log information to identify phishing websites. When a user visits an illegitimate webpage, the browser contacts the real one for resources. This request is registered in the log by the legitimate website server, later this is used to identify illegitimate ones. Wu et al. (2019) has come up with a technique that uses fuzzy logic combined with the power of machine learning and eliminating the use of Boolean algorithm in the system. They make use of domain name, sub-domain name and also the lifetime of the webpage in the authentication process.
Anti-phishing software contains a computer code that identifies phishing websites and other forms used to access the data. These tend to block the content usually with a warning to the user. Anti-virus and Anti-malware are software’s that falls into this category. Armano et al. (2016) has proposed a real-time method to detect phishing websites by developing an add-on or extension for a browser. It extracts information from the websites visited by the user to identify a phishing website, then a caution message is popped on the screen if the website is phishing. Marchal et al. (2017) has proposed a similar kind of real-time browser extension for the Firefox browser.
Other detection methods
Mei et al. (2016) proposed a technique that gets features from the website and with the help of the support vector machine classifier model, the prediction on the authenticity of the website is made. Here, the model is trained first and then it is tested on various test cases.
Hawanna et al. (2016) has proposed a system that uses a novel algorithm to detect phishing websites. It considers various test methods like Alexa ranking, blacklist search, to detect phishing websites. It works well for websites with HTTP protocols. Sahingoz et al. (2019) has used several classification algorithms with NLP to detect phishing websites in real-time. It has shown accuracy of about 97.98%.
Disadvantages of the existing phishing detection methods
Source – HR, M., MV, A., S, G. et al. Development of anti-phishing browser based on random forest and rule of extraction framework