spidy Web Crawler

Spidy (/spˈɪdi/) is the simple, easy to use command line web crawler. Given a list of web links, it uses Python `requests <http://docs.python-requests.org>`__ to query the webpages, and `lxml <http://lxml.de/index.html>`__ to extract all links from the page. Pretty simple!

Created by rivermont (/rɪvɜːrmɒnt/) and FalconWarriorr (/fælcʌnraɪjɔːr/), and developed with help from these awesome people. Looking for technical documentation? Check out DOCS.mdLooking to contribute to this project? Have a look at `CONTRIBUTING.md <https://github.com/rivermont/spidy/blob/master/docs/CONTRIBUTING.md>`__, then check out the docs.

Contributors

The logo was designed by Cutwell
3onyc - PEP8 Compliance.
DeKaN - Getting PyPI packaging to work.
esouthren - Unit testing.
j-setiawan - Paths that work on all OS's.
kylesalk - Logging file handlers
michellemorales - Confirmed OS/X support.
quatroka - Fixed testing bugs.
stevelle - Respect robots.txt.
thatguywiththatname - README link corrections

License

We used the Gnu General Public License (see LICENSE) as it was the license that best suited our needs. Honestly, if you link to this repo and credit rivermont and FalconWarriorr, and you aren't selling spidy in any way, then we would love for you to distribute it. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

spidy Web Crawler

Contributors

License

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

spidy Web Crawler

Contributors

License