Expose a followthemoney-store to a readonly FastAPI
An example instance is deployed here: https://api.investigraph.dev
The api features filtering for entity Schema
and Property
values, sorting
and a search endpoint to query against a sqlite full-text index
that contains the values for name type
properties.
By default, the whole dataset(s) are loaded into an in-memory sqlite to provide very fast and cached responses.
For mid-scale datasets (up to 1GB json dump, didn't test bigger ones yet) the api is incredibly fast when using the in-memory sqlite.
There are two main api endpoints:
- Retrieve a single entity based on its id and dataset, optionally with inlined
adjacent entities:
/{dataset}/entities/{entity_id}
- Retrieve a list of entities based on filter criteria and sorting, with
pagination:
/{dataset}/entities?{params}
- Search for entities (by their name property types) via
Sqlite FTS:
/{dataset}/entities?q=<search term>
- Search for entities (by their name property types) via
Sqlite FTS:
Two more endpoints for catalog / dataset metadata:
- Catalog overview:
/catalog
- Dataset metadata:
/{dataset}
Retrieve a paginated list of entities for the given dataset based on filter criteria.
Optionally inline (nest) adjacent entities.
Entities can be "dehytrated", that means only their featured properties are returned. This is e.g. useful for static site builders to reduce the data amount.
/{dataset}/entities?schema=Company?country=de
Filtering works for all FollowTheMoney properties
{
"id": "NK-A7z....",
"schema": "LegalEntity",
"properties": {
"name": [ "Jane Doe" ]
}
}
Could be queried like this:
/{dataset}/entities?name__ilike=%jane%
For ftm properties: ?order_by={prop}
(descending: /?order_by=-{prop}
)
Numeric
property types are casted via sqlite CAST(value AS NUMERIC)
(ignoring
errors, results in 0) before sorting, and the first property in the value
array is used as the sorting value. (The entity property dict remains
uncasted, aka all properties are multi values as string)
Add q=<term>
GET parameter to the query.
Per default, names, countries, identifiers and featured entity properties are
indexed. See below for INDEX_PROPERTIES
setting.
Aggregations sum
, min
, max
, avg
are performed via sqlite, and work for
both properties and arbitrary extra data. The endpoint accepts all other
parameters from the entities endpoint as well (+ search term via q
).
/{dataset}/aggregate?schema=Payment&aggSum=amount&aggAvg=amount&aggMin=date&aggMax=date
{
"total": 143598,
"query": {
"limit": 100,
"page": 1,
"schema": "Payment",
"order_by": null,
"prop": null,
"value": null
},
"url": "...",
"aggregations": {
"amount": {
"avg": 8909.264242607847,
"sum": 1279352526.7100017
},
"date": {
"min": "2007",
"max": "2021"
}
}
}
FTM_STORE_URI=postgres:///ftm gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 ftmstore_fastapi.api:app
This loads the datasets in the ftm store provided by FTM_STORE_URI
into the in-memory sqlite (1 per worker) and exposes the api at http://localhost:8000
env vars and their defaults
FTM_STORE_URI=followthemoney.store
CATALOG=None # optional specify catalog metadata file
EXPOSE_DATASETS="*" # restrict exposed datasets to comma separated list
BUILD_API_KEY=secret-key-for-build # an api key for static site builders to increase limits
ALLOWED_ORIGIN=http://localhost:3000 # cors, comma-separated origins
CACHE=0 # set 1 to use redis
CACHE_TIMEOUT=0 # infinite
REDIS_URL=redis://localhost:6379
DEFAULT_LIMIT=100 # results per page
SQLITE_IN_MEMORY=1 # 0 to disable
INDEX_PROPERTIES="" # comma-separated additional properties to add to the FTS index, e.g. : "keywords,notes"
# for api docs rendering:
TITLE=FollowTheMoney Store API"
CONTACT_AUTHOR
CONTACT_URL
CONTACT_EMAIL
To even improve performance, api responses can be cached in redis. Per default, this is an infinite cache.
See the example docker-compose.yml
This package is using poetry for packaging and dependencies management, so first install it.
Clone the repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
Spin up dev server and populate with fixtures data:
make api
Run test & typing:
make test
make typecheck
Since March 2023, developing of this project is supported by Media Tech Lab Bayern batch #3