Bookmarks tagged [web-crawling]

^{^{www.codever.land/bookmarks/t/web-crawling}}

anemone

^{https://github.com/chriskite/anemone}

Ruby library and CLI for crawling websites.

tags: ruby, web-crawling
source code

LinkThumbnailer

^{https://github.com/gottfrois/link_thumbnailer}

Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.

tags: ruby, web-crawling
source code

Mechanize

^{https://github.com/sparklemotion/mechanize}

Mechanize is a ruby library that makes automated web interaction easy.

tags: ruby, web-crawling
source code

MetaInspector

^{https://github.com/jaimeiniesta/metainspector}

Ruby gem for web scraping purposes.

tags: ruby, web-crawling
source code

Upton

^{https://github.com/propublica/upton}

A batteries-included framework for easy web-scraping.

tags: ruby, web-crawling
source code

Wombat

^{https://github.com/felipecsl/wombat}

Web scraper with an elegant DSL that parses structured data from web pages.

tags: ruby, web-crawling
source code

cola

^{https://github.com/chineking/cola}

A distributed crawling framework.

tags: python, web-crawling, web-scraping
source code

feedparser

^{https://pythonhosted.org/feedparser/}

Universal feed parser.

tags: python, web-crawling, web-scraping

grab

^{https://github.com/lorien/grab}

Site scraping framework.

tags: python, web-crawling, web-scraping
source code

MechanicalSoup

^{https://github.com/MechanicalSoup/MechanicalSoup}

A Python library for automating interaction with websites.

tags: python, web-crawling, web-scraping
source code

portia

^{https://github.com/scrapinghub/portia}

Visual scraping for Scrapy.

tags: python, web-crawling, web-scraping
source code

pyspider

^{https://github.com/binux/pyspider}

A powerful spider system.

tags: python, web-crawling, web-scraping
source code

robobrowser

^{https://github.com/jmcarp/robobrowser}

A simple, Pythonic library for browsing the web without a standalone web browser.

tags: python, web-crawling, web-scraping
source code

scrapy

^{https://scrapy.org/}

A fast high-level screen scraping and web crawling framework.

tags: python, web-crawling, web-scraping
source code

Apache Nutch

^{https://nutch.apache.org}

Highly extensible, highly scalable web crawler for production environments.

tags: java, web-crawling

Crawler4j

^{https://github.com/yasserg/crawler4j}

Simple and lightweight web crawler.

tags: java, web-crawling
source code

jsoup

^{https://jsoup.org}

Scrapes, parses, manipulates and cleans HTML.

tags: java, web-crawling

StormCrawler

^{http://stormcrawler.net}

SDK for building low-latency and scalable web crawlers.

tags: java, web-crawling

webmagic

^{https://github.com/code4craft/webmagic}

Scalable crawler with downloading, url management, content extraction and persistent.

tags: java, web-crawling
source code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawling.md

web-crawling.md

Bookmarks tagged [web-crawling]

^{^{www.codever.land/bookmarks/t/web-crawling}}

anemone

LinkThumbnailer

Mechanize

MetaInspector

Upton

Wombat

cola

feedparser

grab

MechanicalSoup

portia

pyspider

robobrowser

scrapy

Apache Nutch

Crawler4j

jsoup

StormCrawler

webmagic

Files

web-crawling.md

Latest commit

History

web-crawling.md

File metadata and controls

Bookmarks tagged [web-crawling]