Web Spider

Crawling the web..

- This is a basic web spider for learning purpose, and for release something to opensource community.

How it Works

How is it done..

- Also, it is very simple, and i'll try to maintain it simple. You introduce a start web (The first web to crawl at), it searches all the urls in the indicated website and put them into an array. Then i'll check the sanity of the urls, also some basic regex stuff like (starts the url with http, ends with a TLD). This all in a while loop for recolecting hundreds of urls, but i implemented a easy system to check the visited urls and not do repeated work.

Future features

Features i want to implement.

- Search keywords in site. - According keywords, categorize site. - A long etc..

Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
Crawler.py		Crawler.py
README.md		README.md
__init__.py		__init__.py
linksVisited.txt		linksVisited.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Spider

Crawling the web..

How it Works

How is it done..

Future features

Features i want to implement.

About

Releases

Packages

Languages

redigaffi/Web-Spider

Folders and files

Latest commit

History

Repository files navigation

Web Spider

Crawling the web..

How it Works

How is it done..

Future features

Features i want to implement.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages