Crawl links on a website

THIS IS A FORK OF THE SPATIE CRAWLER. IT ADDS A CALLBACK FUNCTION TO RECIEVE ALL THE LINKS ON THE CRAWLED PAGE.

This package provides a class to crawl links on a website.

Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

Installation

This package can be installed via Composer:

composer require spatie/crawler

Usage

The crawler can be instantiated like this

Crawler::create()
    ->setCrawlObserver(<implementation of \Spatie\Crawler\CrawlObserver>)
    ->startCrawling($url);

The argument passed to setObserver must be an instance that implement the \Spatie\Crawler\CrawlObserver-interface:

/**
 * Called when the crawler will crawl the given url.
 *
 * @param \Spatie\Crawler\Url $url
 */
public function willCrawl(Url $url);

/**
 * Called when the crawler has crawled the given url.
 *
 * @param \Spatie\Crawler\Url       $url
 * @param \Psr\Http\Message\ResponseInterface $response
 */
public function hasBeenCrawled(Url $url, ResponseInterface $response);

/**
 * Called when the crawler has found links on the page
 *
 * @param \SimZal\Crawler\Url                       $url
 * @param \Illuminate\Support\Collection            $links
 */
public function foundLinks(Url $url, $links);

/**
 * Called when the crawl has ended.
 */
public function finishedCrawling();

Filtering certain url's

You can tell the crawler not to visit certain url's by passing using the setCrawlProfile-function. That function expects an objects that implements the Spatie\Crawler\CrawlProfile-interface:

/**
 * Determine if the given url should be crawled.
 *
 * @param \Spatie\Crawler\Url $url
 *
 * @return bool
 */
public function shouldCrawl(Url $url);

Changelog

Please see CHANGELOG for more information what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

Credits

About Spatie

Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

License

The MIT License (MIT). Please see License File for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
src		src
tests/Unit		tests/Unit
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.scrutinizer.yml		.scrutinizer.yml
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawl links on a website

Installation

Usage

Filtering certain url's

Changelog

Contributing

Security

Credits

About Spatie

License

About

Releases 1

Packages

Languages

License

SimZal/crawler

Folders and files

Latest commit

History

Repository files navigation

Crawl links on a website

Installation

Usage

Filtering certain url's

Changelog

Contributing

Security

Credits

About Spatie

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages