Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR removes the async lock from the crawlers and replaces it with a semaphore with a capacity of 20 In reality, neither the lock nor the semaphore are needed. The requests are actually limited by the `request_limiter` of each crawler, not the lock. However, i could not remove the lock because it will brake the logic for the UI scrape queue, that's why i replaced it with a semaphore instead. The lock was making each crawler behave synchronously and the `request_limiter` was never close to being reached. This was only for the crawlers. The downloaders have a semaphore already with a different capacity per domain, so they were not affected. The default capacity for each downloader is 3, defined by `--max-simultaneous-downloads-per-domain` ## Disadvantages The only drawback of replacing the lock with a semaphore (or eventually removing the lock altogether) is that the `request_limiter` is defined per crawler and almost all crawlers right now have a generic `10 requests / sec` limit. Some crawlers may require fine tuning the limiter to make sure CDL does not trigger `429s`
- Loading branch information