Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

Latest commit

 

History

History
27 lines (23 loc) · 905 Bytes

README.md

File metadata and controls

27 lines (23 loc) · 905 Bytes

Web Spider

Crawling the web..


- This is a basic web spider for learning purpose, and for release something to opensource community.

How it Works

How is it done..


- Also, it is very simple, and i'll try to maintain it simple. You introduce a start web (The first web to crawl at), it searches all the urls in the indicated website and put them into an array. Then i'll check the sanity of the urls, also some basic regex stuff like (starts the url with http, ends with a TLD). This all in a while loop for recolecting hundreds of urls, but i implemented a easy system to check the visited urls and not do repeated work.

Future features

Features i want to implement.


- Search keywords in site. - According keywords, categorize site. - A long etc..


Email: [email protected]