An ONS API used to navigate datasets, editions and versions - which are published.
-
Run
brew install mongo
-
Run
brew services start mongodb
-
Run
brew install neo4j
-
Configure neo4j, edit
/usr/local/Cellar/neo4j/3.2.0/libexec/conf/neo4j.conf
-
Set
dbms.security.auth_enabled=false
-
Run
brew services restart neo4j
- Run api auth stub, see documentation
- Run
make debug
Normal sequential order of states:
created
(only on instance)submitted
(only on instance)completed
(only on instance)edition-confirmed
(only on instance - this will create an edition and version, in other words the instance will now be accessible byversion
endpoints). Also the datasetnext
sub-document will also get updated here and so will the edition (authorised users will see a different latest version link versus unauthorised users)associated
(only on version) - datasetnext
sub-document will be updated again and so will the editionpublished
(only on version) - both edition and dataset are updated - must not be changed
There is the possibility to rollback from associated
to edition-confirmed
where a PST user has attached the version to the wrong collection and so not only does the collection_id
need to be
updated with the new one (or removed altogether)
but the state will need to revert back to edition-confirmed
.
Lastly, skipping a state: it is possibly to jump from edition-confirmed
to published
as long as all the mandatory fields are there. There also might be a scenario whereby the state can change
from created
to completed
, missing out the step to submitted
due to race conditions, this is not expected to happen, the path to get to completed
is longer than the submitted
one.
A state machine has now been implemented on the PUT /versions endpoint. This has been implemented as part of the refactoring to bring the application in line with some of the clean architecture principles. The state machine is instantiated on start up of the application as a singleton object which ensures there is only one instance used in the application for consistency. The state machine code is held in the application package, where all validation on the data sent in the request is performed.
The state machine requires a list of allowed transitions for each state and the dataset type to run. The states of created, submitted, and completed have not been included in the state machine as these are not relevant to the PUT /versions endpoint. If the state sent as part of the PUT request is not in the allowed transitions for the current state for the dataset type then a 400 status code response is returned.
The endpoint /health
checks the connection to the database and returns one of:
- success (200, JSON "status": "OK")
- failure (500, JSON "status": "error").
The /health
endpoint replaces /healthcheck
, which now returns a 404 Not Found
response.
Scripts for updating and debugging Kafka can be found here(dp-data-tools)
Environment variable | Default | Description |
---|---|---|
BIND_ADDR | :22000 |
The host and port to bind to |
MONGODB_BIND_ADDR | localhost:27017 |
The MongoDB bind address |
MONGODB_USERNAME | The MongoDB Username | |
MONGODB_PASSWORD | The MongoDB Password | |
MONGODB_DATABASE | datasets | The MongoDB database |
MONGODB_COLLECTIONS | DatasetsCollection:datasets, ContactsCollection:contacts, EditionsCollection:editions, InstanceCollection:instances, DimensionOptionsCollection:dimension.options, InstanceLockCollection:instances_locks | The MongoDB collections |
MONGODB_REPLICA_SET | The name of the MongoDB replica set | |
MONGODB_ENABLE_READ_CONCERN | false |
Switch to use (or not) majority read concern |
MONGODB_ENABLE_WRITE_CONCERN | true |
Switch to use (or not) majority write concern |
MONGODB_CONNECT_TIMEOUT | 5s | The timeout when connecting to MongoDB (time.Duration format) |
MONGODB_QUERY_TIMEOUT | 15s | The timeout for querying MongoDB (time.Duration format) |
MONGODB_IS_SSL | false |
Switch to use (or not) TLS when connecting to MongoDB |
SECRET_KEY | FD0108EA-825D-411C-9B1D-41EF7727F465 |
A secret key used for authentication |
CODE_LIST_API_URL | http://localhost:22400 |
The host name for the CodeList API |
DATASET_API_URL | http://localhost:22000 |
The host name for the Dataset API |
DOWNLOAD_SERVICE_URL | http://localhost:23600 |
The host name for the Download Service |
IMPORT_API_URL | http://localhost:21800 |
The host name for the Import API |
GRACEFUL_SHUTDOWN_TIMEOUT | 5s | The graceful shutdown timeout in seconds |
WEBSITE_URL | http://localhost:20000 |
The host name for the website |
KAFKA_ADDR | localhost:9092 |
The address of (TLS-ready) Kafka brokers (comma-separated values) |
KAFKA_CONSUMER_MIN_BROKERS_HEALTHY | 2 | The minimum number of consumer brokers needed |
KAFKA_PRODUCER_MIN_BROKERS_HEALTHY | 2 | The minimum number of producer brokers needed |
KAFKA_VERSION | 1.0.2 |
The version of (TLS-ready) Kafka |
KAFKA_SEC_PROTO | unset (only TLS ) |
If set to TLS , Kafka connections will use TLS |
KAFKA_SEC_CLIENT_KEY | unset | PEM [2] for the client key (optional, used for client auth) [1] |
KAFKA_SEC_CLIENT_CERT | unset | PEM [2] for the client certificate (optional, used for client auth) [1] |
KAFKA_SEC_CA_CERTS | unset | PEM [2] of CA cert chain if using private CA for the server cert [1] |
KAFKA_SEC_SKIP_VERIFY | false |
Ignore server certificate issues if set to true [1] |
GENERATE_DOWNLOADS_TOPIC | filter-job-submitted |
The topic to send generate full dataset version downloads to |
HEALTHCHECK_INTERVAL | 30s | The time between calling healthcheck endpoints for check subsystems |
HEALTHCHECK_CRITICAL_TIMEOUT | 90s | The time taken for the health changes from warning state to critical due to subsystem check failures |
ENABLE_PRIVATE_ENDPOINTS | false |
Enable private endpoints for the API |
DISABLE_GRAPH_DB_DEPENDENCY | false |
Disables connection and health check for graph db |
DOWNLOAD_SERVICE_SECRET_KEY | QB0108EZ-825D-412C-9B1D-41EF7747F462 |
A key specific for the download service to access public/private links |
ZEBEDEE_URL | http://localhost:8082 |
The host name for Zebedee |
ENABLE_PERMISSIONS_AUTH | false |
Enable/disable user/service permissions checking for private endpoints |
DEFAULT_MAXIMUM_LIMIT | 1000 |
Default maximum limit for pagination |
DEFAULT_LIMIT | 20 |
Default limit for pagination |
DEFAULT_OFFSET | 0 |
Default offset for pagination |
OTEL_BATCH_TIMEOUT | 5s | Interval between pushes to OT Collector |
OTEL_EXPORTER_OTLP_ENDPOINT | http://localhost:4317 |
URL for OpenTelemetry endpoint |
OTEL_SERVICE_NAME | dp-dataset-api |
Service name to report to telemetry tools |
OTEL_ENABLED | false |
Feature flag to enable OpenTelemetry |
Notes:
-
Ignored unless using TLS (i.e.
KAFKA_SEC_PROTO
has a value enabling TLS) -
PEM values are identified as those starting with
-----BEGIN
and can use\n
(sic) instead of newlines (they will be converted to newlines before use). Any other value will be treated as a path to the given PEM file.
Environment variable | Default | Description |
---|---|---|
GRAPH_DRIVER_TYPE | "" | string identifier for the implementation to be used (e.g. 'neptune' or 'mock') |
GRAPH_ADDR | "" | address of the database matching the chosen driver type (web socket) |
NEPTUNE_TLS_SKIP_VERIFY | false | flag to skip TLS certificate verification, should only be true when run locally |
NEPTUNE_TLS_SKIP_VERIFY
to true. See our Neptune guide for more details.
See CONTRIBUTING for details.
Copyright © 2016-2022, Office for National Statistics
Released under MIT license, see LICENSE for details