This repository is part of the Find Case Law project at The National Archives. For more information on the project, check the documentation.
Marks up judgments in Find Case Law with references to other cases and legislation.
A description of the system's architecture
There is a suite of tests that can be run locally with pytest -m "not integration"
or scripts/test
but you'll need to ensure you've installed the requirements in src/tests/requirements.txt
You can also obtain a test coverage report with coverage run --source . -m pytest && coverage report
The tests are currently run in CI as specified in .github/workflows/ci_lint_and_test.yml
There are a number of places where enrichment can be turned off:
-
Marklogic Username/Password
- Go to the production Marklogic interface, "Admin" (top), "Security" (left), "Users" (left), "enrichment-engine" (mid).
- Make sure you know where the password is stored so you can put access back afterwards!
- Changing the password will mean no Enrichment processes can interact with Marklogic -- no getting documents, no uploading them
- Messages will still be sent from the Editor interface, and will build up.
- Purge the queues in AWS before turning Enrichment back on. The database will not load stale documents due to locks expiring.
- Seemed to work well last time, but there were a lot of warnings
-
AWS Lambda that fetches XML
- Not actually tested in anger!
- Log into
da-caselaw-enrichment
. Make sure to switch toeu-west-2
/London
. - In Lambda, Functions, select
tna-s3-tna-production-fetch-xml
- Top left, press Throttle.
- This will prevent any ingestion of the incoming messages which will build up
- Anything currently in process will finish and will continue to run to completion
- You can change the concurrency settings to unthrottle it
- Note that manual changes to the lambda settings will likely be lost if new code is deployed
-
Modifying the code
- We could change either the code in one of the lambdas -- probably
fetch_xml
orpush_enriched_xml
for the start/end of the process - We could also modify the privileged API, but that potentially affect all users of it but there aren't any at the time of writing
- We could change either the code in one of the lambdas -- probably
Situations when you may want to debug an enrichment run:
- a document we expect to have been enriched, has not been enriched as expected.
- an AWS error alert for a lambda function has been raised to us (probably through email notification subscription)
The main ways we have to debug are to look at AWS
logs for lambda
functions and data stored in s3 buckets but we need appropriate information to find these in AWS first.
You will need access to for the staging or production Enrichment AWS space as appropriate to follow these debugging tips. If not, you could skip most of this and attempt at recreating a local test of a lambda function you think might have a problem like in Recreate and debug
tools/see_prod.html
will allow you to search the AWS logs for a specific judgment by URL and link directly to files generated on production.
If you run scripts/get_schema
, the schema will be downloaded, and scripts/validate_local <xmlfile>
will check it complies with the schema
Currently, the main
branch is deployed to staging, and if that doesn't fail, it is then deployed to production.
The version should be an integer string (like "1"): note, however, that pre-December 2023 versions were version "0.1.0".
As a part of each pull request that isn't just keeping versions up to date:
- Update the version number in
enrichment_version.string
insrc/lambdas/determine_legislation_provisions/index.py
- Update
CHANGELOG.md
with a brief description of the change - Create a release on Github with a tag like
v1
. This does nothing, but is useful to help us keep track.