Skip to content

Releases: andykais/scrape-pages

v3.5.2

10 Sep 22:22
Compare
Choose a tag to compare

placeholder

v3.5.0

07 Feb 01:52
0fcdf02
Compare
Choose a tag to compare

Breaking change

  • no longer call
const { start, query } = scrape(config, options, params)

the library is now used like so

const scraper = new ScraperProgram(config, options, params)
scraper.start()
  • query arguments changed from query(args: { scraper: string[], groupBy: string }) to query(scrapers: string[], { groupBy: string })

Enhancements/fixes

  • add a scraper lock file to prevent two scrapers from clashing over the same directory
  • tests have better coverage of emitter values
  • add class methods for emitter emit events (stop, stopScraper, useRateLimiter)
  • error if commands are used unexpectedly (e.g. calling stop() before start())

v3.4.2

12 Oct 00:10
f6073f4
Compare
Choose a tag to compare

Breaking change

N/A

Enhancements/fixes

  • prevent download filenames from clashing. They are based on database ids now instead of url filenames (#39)

v3.4.1

11 Oct 05:46
Compare
Choose a tag to compare

Breaking change

  • config structure changed again to a flatter more intuitive structure (#33).

Enhancements/fixes

  • upgrade dev dependencies in response to security alerts
  • added an internal query walker to help debug query ordering. Hopefully this means I can move faster in regards to query bugs (#36).
  • speed up query by removing unnecessary compares & extra recursion loops

v3.3.1

24 Jul 18:03
4cb3978
Compare
Choose a tag to compare

Breaking change

N/A

Enhancements/fixes

  • upgrade dev dependencies in response to security alerts

v3.3.0

04 Apr 18:46
Compare
Choose a tag to compare

Breaking change

N/A

Enhancements/fixes

  • add cache: true flag for either individual scrapers, or a whole project.
  • add metadata.json file to download folder, log warning when library versions don't match
  • check if scrapers in config.run exist in config.scrapers
  • add download byteLength to query results
  • add stop:<scraper> emittable event

v3.2.1

05 Mar 03:36
Compare
Choose a tag to compare

Breaking change

  • config has changed significantly to separate scraper definitions from the download flow. See readme.
  • separate reusable options from one-offs so scrape now has three parameters: config, options and params.
  • scraper is invoked differently. scrape(config, options, params) yields a query and start, where start triggers the scraper and folder creation. (#15)

Enhancements/fixes

  • add limit field on parse scraper configs (#9)
  • validate scraper names & input keys (#10)
  • add log file rotation
  • travis ci now supports tests on macos, windows coming soon!

v3.2.0

02 Mar 00:17
Compare
Choose a tag to compare

Breaking change

  • config has changed significantly to separate scraper definitions from the download flow. See readme.
  • separate reusable options from one-offs so scrape now has three parameters: config, options and params.
  • scraper is invoked differently. scrape(config, options, params) yields a query and start, where start triggers the scraper and folder creation. (#15)

Enhancements/fixes

  • add limitToValues field on scraper configs (#9)
  • validate scraper names & input keys (#10)
  • add log file rotation
  • travis ci now supports tests on macos, windows coming soon!

v3.1.2

20 Jan 22:45
2f540d5
Compare
Choose a tag to compare

Breaking change

N/A

Enhancements/fixes

  • change surface api to a function returning promise
const siteScraper = new PageScraper(config)
const emitter = siteScraper.run(options)
// becomes
const { on, emit, query } = await scrape(config, options)
  • add logging & file logging

v3.1.0

30 Dec 05:44
Compare
Choose a tag to compare

Breaking change

N/A

Enhancements/fixes

  • add scrapeNext clause to download config
  • add downloadPriority flag to run options
  • switch to typescript
  • add functional tests