Bookmarks tagged [web-crawling]
https://github.com/chriskite/anemone
Ruby library and CLI for crawling websites.
- tags: ruby, web-crawling
- source code
https://github.com/gottfrois/link_thumbnailer
Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.
- tags: ruby, web-crawling
- source code
https://github.com/sparklemotion/mechanize
Mechanize is a ruby library that makes automated web interaction easy.
- tags: ruby, web-crawling
- source code
https://github.com/jaimeiniesta/metainspector
Ruby gem for web scraping purposes.
- tags: ruby, web-crawling
- source code
https://github.com/propublica/upton
A batteries-included framework for easy web-scraping.
- tags: ruby, web-crawling
- source code
https://github.com/felipecsl/wombat
Web scraper with an elegant DSL that parses structured data from web pages.
- tags: ruby, web-crawling
- source code
https://github.com/chineking/cola
A distributed crawling framework.
- tags: python, web-crawling, web-scraping
- source code
https://pythonhosted.org/feedparser/
Universal feed parser.
- tags: python, web-crawling, web-scraping
https://github.com/lorien/grab
Site scraping framework.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/MechanicalSoup/MechanicalSoup
A Python library for automating interaction with websites.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/scrapinghub/portia
Visual scraping for Scrapy.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/binux/pyspider
A powerful spider system.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/jmcarp/robobrowser
A simple, Pythonic library for browsing the web without a standalone web browser.
- tags: python, web-crawling, web-scraping
- source code
A fast high-level screen scraping and web crawling framework.
- tags: python, web-crawling, web-scraping
- source code
Highly extensible, highly scalable web crawler for production environments.
- tags: java, web-crawling
https://github.com/yasserg/crawler4j
Simple and lightweight web crawler.
- tags: java, web-crawling
- source code
Scrapes, parses, manipulates and cleans HTML.
- tags: java, web-crawling
SDK for building low-latency and scalable web crawlers.
- tags: java, web-crawling
https://github.com/code4craft/webmagic
Scalable crawler with downloading, url management, content extraction and persistent.
- tags: java, web-crawling
- source code