You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the $depthTree property at vendor/spatie/crawler/src/Crawler.php has nodes inserted using depth first search algorithm and I find it very inconvenient:
Let's assume we crawl a website that has 4 paths: /, /index, /terms and /terms1. We can also assume that the crawling order is the same as the order of the paths described above. Then, the $depthTree at the end of the crawl will look like:
/
/index
/terms
/terms-1
/terms
So, if I ask: what is the depth() of the node with value /terms-1? The answer, with the structure above, is 3. However, I think (arguably) that it should be 2. The method to compute the depth seems correct.
In my point of view, the problem is with the structure that should insert nodes using breadth first search algorithm:
/
/index
/terms
/terms
/terms-1
Because we start from the root /, and, on this way, we always have the shortest path till our node and the $maximumDepth setting can easily be explained as the minimum amount of clicks to get to the link.
I would like to hear what you think.
This is an example of an implementation using the BFS:
publicfunctionaddToDepthTree(UriInterface$url, UriInterface$parentUrl, Node$node = null): ?Node
{
if (is_null($this->maximumDepth)) {
returnnewNode((string) $url);
}
$queue = newSplQueue(); // Use a queue for BFS$queue->enqueue($this->depthTree);
while (!$queue->isEmpty()) {
$node = $queue->dequeue();
if ($node->getValue() === (string) $parentUrl) {
$newNode = newNode((string) $url);
$node->addChild($newNode);
return$newNode;
}
foreach ($node->getChildren() as$currentNode) {
$queue->enqueue($currentNode); // Enqueue children for BFS
}
}
returnnull;
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Currently the
$depthTree
property atvendor/spatie/crawler/src/Crawler.php
has nodes inserted using depth first search algorithm and I find it very inconvenient:Let's assume we crawl a website that has 4 paths:
/
,/index
,/terms
and/terms1
. We can also assume that the crawling order is the same as the order of the paths described above. Then, the$depthTree
at the end of the crawl will look like:So, if I ask: what is the
depth()
of the node with value/terms-1
? The answer, with the structure above, is3
. However, I think (arguably) that it should be2
. The method to compute thedepth
seems correct.In my point of view, the problem is with the structure that should insert nodes using breadth first search algorithm:
Because we start from the root
/
, and, on this way, we always have the shortest path till our node and the$maximumDepth
setting can easily be explained as the minimum amount of clicks to get to the link.I would like to hear what you think.
This is an example of an implementation using the BFS:
Beta Was this translation helpful? Give feedback.
All reactions