The Open Data Hub Mobility Data Collectors, historically called also Big Data Platform data collectors, and also data providers where contained in this repo, therefore it is called common.
This repository contains the source code of all data collectors, that are Java workers that connect to a remote data pool, such as an API, MQTT broker, FTP server or their like, and download data, aggregate and enriches that, and finally send it to the Big Data Platform writer, which stores it inside a Postgres DB.
We use Keycloak for authentication against the Open Data Hub writer API.
Table of contents
These instructions will get you a copy of the project up and running on your
local machine for development and testing purposes. These are just general
guidelines, for specific details, refer to the README.md
file in each folder.
To build the data collector project, the following prerequisites must be met:
- Java JDK 1.8 or higher (e.g. OpenJDK)
- Maven 3.x
- The Open Data Hub Mobility Writer (aka Big Data Platform Core) installed
If you want to run the application using Docker, the environment is already set up with all dependencies for you. You only have to install Docker and Docker Compose and follow the instruction in the dedicated section.
Hint: To be sure to have the correct Java version and build environment equal to our infrastructure use the provided docker configuration.
Get a copy of the repository:
git clone https://github.com/noi-techpark/bdp-commons.git
Change directory:
cd bdp-commons/data-collectors/[your-collector]
Build the project:
mvn clean package
The unit tests can be executed with the following command:
mvn clean test
- Inside the corresponding data collector folder, copy
.env.example
to.env
and configure it - Run
docker-compose up -d
- You can follow the output with
docker-compose logs -f
Please, refer to the README.md
inside that folder for further details, and
report any incidence to [email protected]
.
Copy this file to .vsode/launch.json
:
{
"version": "0.2.0",
"configurations": [
{
"type": "java",
"name": "Attach",
"request": "attach",
"hostName": "0.0.0.0",
"port": "9000",
"justMyCode": false
}
]
}
Run docker-compose up -d
inside the data-collector folder of your choice, and
then launch Attach
from VSCode. You are now ready to set breakpoints and debug.
Change directory into the data collector you want.
You can set the parameters directly as environmental variables (see
.env.example
) and start it, as follows:
- Newer data collectors are Spring Boot applications
mvn spring-boot:run
...or, if you want to use your personalized Spring profile:
cd data-collectors/[your-collector]
cp src/main/resources/application.properties src/main/resources/application-local.properties
# Now open src/main/resources/application-local.properties and modify values as you like
mvn -D spring.profiles.active=local spring-boot:run
- Older data collectors are Spring applications with an additional tomcat maven plugin:
mvn tomcat:run \
-DPARAM1=... \
-DPARAM2=... \
-DPARAM3=...
...or, set them inside the relevant .properties
files directly (see the
corresponding README.md
for details), and run:
mvn tomcat:run
You do not need special credentials for local development. Use the following
Keycloak OAuth parameters inside application.properties
to get started
immediately (some data collectors have them already as defaults):
authorizationUri=https://auth.opendatahub.testingmachine.eu/auth
tokenUri=https://auth.opendatahub.testingmachine.eu/auth/realms/noi/protocol/openid-connect/token
BASE_URI=http://localhost:8999/json
clientId=odh-mobility-datacollector-development
clientName=odh-mobility-datacollector-development
clientSecret=7bd46f8f-c296-416d-a13d-dc81e68d0830
scope=openid
Or, find the corresponding variable names inside the specific .env
files of
each data collector, if you develop with docker. Unfortunately, these were not
standardized in the past.
If you want to test it on our infrastructure directly, please read about Credentials in our Contributor Guidelines.
- Copy
data-collectors/helloworld/ci-helloworld.yml
to.github/workflows/ci-your-new-datacollector.yml
- Inside that file, replace all
helloworld
withyour-new-datacollector
- Go to
data-collectors/your-new-datacollector
- Check which docker servers have the least load recently, and choose them for
testing and production inside the
infrastructure/ansible/hosts
file - If you need to inject credentials:
- go to Github Actions Secrets
- create new credentials with keys in uppercase letters
- either, prefixed with the data collector name, if they are used only there
- or with a generic meaningful names, if you use them in more collectors
- inject them in your Github Action Yaml like
${{ secrets.HELLOWORLD_SECRET_1 }}
To update a dependency in all data-collectors the quickversionbump scripts can be used.
- quickversionbump.sh: update dc-interface
- quickversionbump-generic.sh: update any dependency
- quickversionbump-min.sh: update min version in properties, if dependency is not used but a minimal version is mandatory
Note: Read the comments in every script for further instructions
For support, please contact [email protected].
If you want to write a new Data Collector:
- Read and follow our Getting Started guidelines
- Copy/paste the helloworld example in a new folder under
data-collectors
, choose the name of your data collector for that folder - Find
TODO
comments and follow their instructions - See and alter code inside
SyncScheduler.java
- Start the writer API locally and test everything:
- Writer API with Docker
- Start your data collector
- Check log outputs of the writer and the data collector to find issues
- Connect to the DB, and see what is their after some tests
- Create a pull request as described in the guidelines above
More documentation can be found at https://docs.opendatahub.com.
The code in this project is licensed under the GNU AFFERO GENERAL PUBLIC LICENSE Version 3 license.
See the LICENSE file for more information.
This project is REUSE compliant, more information about the usage of REUSE in NOI Techpark repositories can be found here.
Since the CI for this project checks for REUSE compliance you might find it useful to use a pre-commit hook checking for REUSE compliance locally. The pre-commit-config file in the repository root is already configured to check for REUSE compliance with help of the pre-commit tool.
Install the tool by running:
pip install pre-commit
Then install the pre-commit hook via the config file by running:
pre-commit install