A unified API to the various AI applications we have built as part of the IT’s JOINTLY project in order to generate additional or missing metadata.
An OCI-compliant image can be built in one of two ways:
Ensure that nix is installed with flakes support. Then, the image can be copied directly to the Docker
or podman
daemon through
nix run "github:openeduhub/python-kidra#docker.copyToDockerDaemon"
or
nix run "github:openeduhub/python-kidra#docker.copyToPodman"
The image can also be built without a local nix installation through bootstrapping. For this, another docker image, containing a nix installation with flakes support, will be used.
Make sure to be inside of repository before running build.sh
; it will not work otherwise.
git clone https://github.com/openeduhub/python-kidra.git
cd python-kidra
sh build.sh
The image will be available as result
.
- Note: in order to reduce the amount of redundant building in future build processes, a persistent build-container
kidra-builder
is created as part of the script. This container will contain a cache of all used artifacts of previous builds. While it is safe to remove it afterward, this will cause a full re-build when running the script again.
Now, load the created image through
docker load -i result
A message will appear to confirm that the image has been loaded, including its name and version.
Now, start the service through
docker run -p 8080:8080 python-kidra:<version>
This service can also be run and installed as a native Nix
Flake
application. In particular, the following command will run the service locally:
nix run "github:openeduhub/python-kidra"
We provide two additional versions of the application – one with CUDA support and one with fewer dependencies (i.e. no bundled web browsers). Note that the latter will disable some features.
- CUDA support can be accessed through the
Nix Flake
endpoints with thewith-cuda
-suffix, likedocker-with-cuda
,python-kidra-with-cuda
, or simplywith-cuda
:# enter development environment with CUDA nix develop "github:openeduhub/python-kidra#with-cuda" # run webservice with CUDA nix run "github:openeduhub/python-kidra#with-cuda" # build docker image with CUDA nix run "github:openeduhub/python-kidra#docker-with-cuda.copyToDockerDaemon"
Do note that the size of the resulting application will be significantly larger (almost twice as large) and that
wlo-classification
(i.e. the/disciplines
endpoint of the API) will not be built with CUDA support regardless.Additionally, building the application with CUDA support may take a considerable amount of time, especially when the additional caches specified in this project are not used.
- Similarly, the more minimal builds can be accessed with the
without-browsers
-suffix. These save a bit more than 1 GB of space.# run more minimal webservice nix run "github:openeduhub/python-kidra#without-browsers" # build more minimal docker image nix run "github:openeduhub/python-kidra#docker-without-browsers.copyToPodman"
The following services are currently available from the Kidra
:
- text-extraction: Extract text from URLs
- text-statistics: Calculate various metrics on text, e.g. reading time or readability
- topic-statistics: Calculate various statistics for WLO topic pages
- its-jointprobability: Bayesian model that predicts multiple metadata fields, such as school discipline or educational context
- wlo-topic-assistant: Find WLO topics in texts
- wlo-classification: Predict disciplines relevant for texts
- kea: Link relevant Wikipedia articles found in texts (requests are simply forwarded to an external service)
The service requires around 8 GB of RAM to start up.
Depending on the usage of the Bayesian prediction model, this requirement may be higher – specifically, the RAM usage of predictions is directly proportional to the num_samples
parameter. At num_samples = =1000
, around 2 GB of additional RAM are required to process the request.
Each individual service available through this API is located on another subdomain. The input data, and potential parameters, are passed as JSON objects.
Once the service is running, an interface listing all the available end-points and their documentation is available at http://localhost:8080/docs.
Additionally, this service implements an OpenAPI specification, which is accessible from the /v3/api-docs
end-point.
To ensure that all Python packages with their correct versions are installed, we recommend using Nix
. The development environment can be activated locally by running
nix develop
while inside this project.
With direnv installed, this process can be automated such that the development environment will be loaded whenever the project is visited. To allow direnv
to activate the environment automatically, run
direnv allow
while inside this project.
As a prerequisite to adding a new service to the Kidra
, the service in question must implement a web-service that exposes the service’s functionality through POST
requests. Ideally, the service also provides an OpenAPI specification, which will then be automatically integrated.
If the service shall be packaged as part of the Kidra
and be run as part of it, this web-service must also offer a way to specify the port on which it shall run at. For this, we recommend a CLI flag --port
.
All services are added to the Kidra
web-service in webservice.py. Here, you have two primary options:
- Add information about the service to
SERVICES
. Services collected inSERVICES
will be automatically added to the web-service according to the information and parameters provided.name
- defines the name of the end-point in the
Kidra
that links to the service. autostart
- whether to automatically start the service from the
Kidra
. If the service shall be automatically started, it must be available to theKidra
, see Installing a new service boot_timeout
- the number of seconds to wait for the service to start. No timeout is enforced when set to
None
. binary
- the name of the executable that is run when the service shall be started from within the
Kidra
. host
- the host to contact when trying to access the service. Should be set to
"localhost"
if the service is started as part of theKidra
. port
- the port to start the service with when automatically starting it. This is also the port that delegated requests to the service are sent to.
post_subdomain
- the subdomain of the service to access when delegating a request to it.
openapi_schema
- the subdomain of the service on which the OpenAPI specification is available.
- Alternatively, manually add an end-point to the
FastAPI
application (see https://fastapi.tiangolo.com/tutorial/first-steps/)
When a service shall be started as part of the Kidra
(i.e. it is not an external service that might run on a different system), it must be added to the run-time environment.
- If the service has already been packaged in nixpkgs, no further work is necessary here. Otherwise, we recommend packaging the service as a Flake and providing it as an input in flake.nix (see the other sub-services, such as
text-statistics
). - Make the binaries of the service available to the
Kidra
inmakeWrapperArgs
of the build specification ofpython-kidra
(package.nix). Additionally, add an overlay that provides the package in flake.nix.