Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.
Thingston Crawler requires:
- PHP 7.1 or above.
Add Thingston Crawler to any PHP project using Composer:
composer require thingston/crawler
Simply create a new Crawler
instance and invoke start
method with any public URI:
use Thingston\Crawler;
$crawler = new Crawler();
$crawler->start('https://www.wikipedia.org/');
In order to process results from the crawling process you may add as many many Observers.
An Observer is a concrete class implement Thingston/Crawler/Observer/ObserverInterface
.
In case you find issues with this code please open a ticket in Github Issues at https://github.com/thingston/crawler/issues.
Open Source is made of contribuition. If you want to contribute to Thingston please follow these steps:
- Fork latest version into your own repository.
- Write your changes or additions and commit them.
- Follow PSR-2 coding style standard.
- Make sure you have unit tests with full coverage to your changes.
- Go to Github Pull Requests at https://github.com/thingston/crawler/pulls and create a new request.
Thank you!
All relevant changes on this code are logged in a separated log file.
Version numbers follow recommendations from Semantic Versioning.
Thingston code is maintained under The MIT License.