A Focused Web Crawler for Strengthening Cyber Security and Building a Knowledge-Based Domain

Version
Download 2
File Size 330.08 KB
File Count 1
Create Date October 5, 2020
Last Updated October 5, 2020

CONFPRO/V31/JUNE2020/016

Description

ABSTRACT:

The introduction of web crawling made search engines possible and easier. A web-crawler has a behavioral similitude with the spider. Most times, a web-crawler is often termed a spider because of its behavior which is linked to building nets and trapping information to those nets. This paper investigates how the web information can be crawled, the basic architecture of a focused web crawler, how data is stored and used for making decisions to curb cybersecurity issues. The paper adopts a review approach in studying the features that popular search engines use in getting information. The paper follows the principles of software engineering in developing an algorithm and a pseudocode for development of a web crawler. Conventional crawling takes into account two traversal techniques in dealing with data - Breadth First Search(BFS) and Depth First Search (DFS) techniques. Lastly, the developed pseudocode demonstrates how crawling can be viewed as a brute force attack and how to avoid it. In industry 4.0, cybersecurity is a big concern, which could be addressed through the implementation of a focused crawler to fetch data from the web, index it , build a knowledge domain for future referencing.

Keywords— Web-crawler, pseudocode, BFS, DFS, Search Engine, Cybersecurity