-
Notifications
You must be signed in to change notification settings - Fork 0
Data and index architecture and implications
M-Nicholls edited this page Apr 15, 2015
·
1 revision
Background key points:
- The Atlas database and the search index are separate data stores
- All Atlas searching, record list retrieval, downloads and facets rely on the index. Only when viewing record details is a user looking at the data in the database.
- There is an index that serves the production Atlas (other, older or newer, indexes may also be present as a result of other re-indexing processes).
- Re-indexing uses the database to generate a new index based on what is currently in the database, the new index needs to be manually allocated to production
The two can get out of sync:
e.g.
- a record can be found in the search but when the user goes to view the details nothing comes back.
- Record counts don't match a recent load of data and the records cannot be found.
Implications on processing and exporting data:
- data resource load, sample, process adds content to the database but not the index
- delete removes records from the database and the index in the one process
- exports download data from the index
Timing:
- Deletes must be run against the most recent production index, running a delete while a re-index is in process or before the newly generated index is swapped to production will create a situation where records are deleted from the data base but still appear in the index.
- Live indexing (not recommended) must be run against the most recent production index, running a live index while a re-index is in process or before the newly generated index is swapped to production will create a situation where the data is in the data base but not in the index
- Load, sample, process steps can be run at any time to add data to the data base, the records will not appear in the index until the next re-index is completed and the index allocated to production
- Downloads should be run following a re-index