ABSTRACT
Embedding malicious URLs in e-mails is one of the most common web threats facing the internet community today. Malicious URLs have been widely used to mount various cyber-attacks like spear phishing, pharming, phishing and malware. By falsely claiming to be a trustworthy entity, users are lured into clicking on these compromised links to divulge vital information such as usernames, passwords, or credit card details and unknowingly succumb to identity theft. Hence, the detection of malicious URLs in e-mails is very essential so as to help internet users implement safe practices and as well prevent them from becoming victims of fraud. This paper explores how malicious links in e-mails can be detected from the lexical and host-based features of their URLs to protect users from identity theft attacks. This research uses Naïve Bayesian classifier as a probabilistic model to detect if a URL is malicious or legitimate. The Naïve Bayesian classifier is used to count up the occurrence of each feature in an e-mail and calculate the cumulative score. If the cumulative score is greater than the given threshold, the URL is considered malicious otherwise the URL is legitimate.
Keywords: Malicious URLs; Pharming; Phishing, Attacks; Naïve Bayesian classifier, threshold