- [X] Modify runRelayFetch logic to store and use last known noteId (most recently fetched).
- [X] Implement a note iterator/generator with a limit count
- [X] Use named cursor e.g., ‘front/abstract’, ‘rear/*’
- [X] Run fetching w/ sort num:asc
- [ ] Create logic to re-run fetch for all papers
- [ ] Use named cursors, stored in mongodb
- [X] Create test plan
- [X] Run against live site w/dev config
- [X] Run against mocked api
- [X] Koa-based api mimicking openreview
- [X] /login api mockup
- [X] /notes api mockup
- [X] Koa-based api mimicking openreview
- [X] Profile and report api fetch times
- [ ] Delete downloaded htmls/artifacts when done
- [ ] Delete /tmp files created by chrome
- [ ] Reap dead chrome instances
- [ ] Deploy new code manually
- [ ] merge adam -> iesl
- [ ] Get PM2 running w/o bree wrapper
- [ ] refactor monitor/stats module to show basic functionality
- keep track of slow extraction ids
- Fetch should only hit openreview api 1 time.
- keep track of hash of extracted fields, make note of when they change
- re-extracting from the beginning is a local-only operation, make a log record when updates should/do happen
- use PM2 hooks to autodeploy
- allow multiple extraction workers when responseUrl is known and hosts can be spread out over time
- Make spider not write body/header files (use cli option to control behavior)
- Use url_status responseUrl to avoid redirect issues