THIS IS A FORK OF THE SPATIE CRAWLER. IT ADDS A CALLBACK FUNCTION TO RECIEVE ALL THE LINKS ON THE CRAWLED PAGE.
This package provides a class to crawl links on a website.
Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.
This package can be installed via Composer:
composer require spatie/crawler
The crawler can be instantiated like this
Crawler::create()
->setCrawlObserver(<implementation of \Spatie\Crawler\CrawlObserver>)
->startCrawling($url);
The argument passed to setObserver
must be an instance that implement the \Spatie\Crawler\CrawlObserver
-interface:
/**
* Called when the crawler will crawl the given url.
*
* @param \Spatie\Crawler\Url $url
*/
public function willCrawl(Url $url);
/**
* Called when the crawler has crawled the given url.
*
* @param \Spatie\Crawler\Url $url
* @param \Psr\Http\Message\ResponseInterface $response
*/
public function hasBeenCrawled(Url $url, ResponseInterface $response);
/**
* Called when the crawler has found links on the page
*
* @param \SimZal\Crawler\Url $url
* @param \Illuminate\Support\Collection $links
*/
public function foundLinks(Url $url, $links);
/**
* Called when the crawl has ended.
*/
public function finishedCrawling();
You can tell the crawler not to visit certain url's by passing using the setCrawlProfile
-function. That function expects
an objects that implements the Spatie\Crawler\CrawlProfile
-interface:
/**
* Determine if the given url should be crawled.
*
* @param \Spatie\Crawler\Url $url
*
* @return bool
*/
public function shouldCrawl(Url $url);
Please see CHANGELOG for more information what has changed recently.
Please see CONTRIBUTING for details.
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.
The MIT License (MIT). Please see License File for more information.