- This is a basic web spider for learning purpose, and for release something to opensource community.
- Also, it is very simple, and i'll try to maintain it simple. You introduce a start web (The first web to crawl at), it searches all the urls in the indicated website and put them into an array. Then i'll check the sanity of the urls, also some basic regex stuff like (starts the url with http, ends with a TLD). This all in a while loop for recolecting hundreds of urls, but i implemented a easy system to check the visited urls and not do repeated work.
- Search keywords in site. - According keywords, categorize site. - A long etc..
Email: [email protected]