-
Notifications
You must be signed in to change notification settings - Fork 5
Home
- Main concepts
- How to install
- How to upgrade
- How to access the agent
- Basic flow
- How to set up multiple instances of StreamSets
- Monitoring
- Troubleshooting
- Frequently asked questions
- Source - This is where you want your data to be pulled from.
- Destination - Where to put your data. Available destinations: HTTP client - Anodot rest API endpoint
- Pipeline - pipelines connect sources and destinations with data processing and transformation stages.
-
Raw pipeline - are pipelines that pull data from a data source and save it to your local filesystem without any transformations. Desired output directory might be configured via the
LOCAL_DESTINATION_OUTPUT_DIR
environment variable in the agent docker container, by default it's/usr/src/app/local-output
.
- Take data from a source
- If a destination is an HTTP client - every record is transformed to JSON object according to specs of Anodot 2.0 metric protocol
- Values are converted to floating-point numbers
- Timestamps are converted to UNIX timestamp in seconds
For more details regarding each integration, please go to the dedicated page for that integration.
Monitoring Apps and tools:
- Cacti
- Observium
- PRTG
- Prometheus
- Splunk
- Solarwinds
- VictoriaMetrics, Thanos, Prometheus
- Zabbix
Pub/Sub:
- Kafka
Files & Logs:
- Coralogix
- Directory (Files)
- Elasticsearch
- RRD
Databases:
- Clickhouse
- Databricks
- Impala
- InfluxDB
- MongoDB
- MSSQL
- MySQL
- Oracle
- PostgreSQL
Other:
- Sage
- SNMP
Required resources depend really on amount of data you want to stream. In general, agent can process ~1000 eps per 1,5 vCPU. That means that if you need to process 10000eps, you'll need to have 15 vCPU
Memory allocation depends on a number of pipelines to create. Each pipeline represents a single query to run (or a kafka topic to consume). Each pipeline requires 300-500 Mb. So for the standard server with 8Gb memory there shouldn't be more than 25 pipelines.
Minimum requirement: 6GB RAM, 4vCPU
Standard recommendation: 8GB RAM, 12vCPU
- Docker & docker-compose. Docker installation guide
- An active Anodot account; the data destination.
Note: if you're going to work with the agent using REST API, you need to forward the 80 port from the agent docker container to your host machine. To do that uncomment the ports
paragraph in the docker-compose.yaml file for the agent service. You can change the target port from 8080 to any other port if needed
- Download agent.zip
- Run
unzip agent.zip
- Run
./agent.sh install
Increase Java heap size in SDC_JAVA_OPTS
in the docker-compose.yaml
if you plan to run a lot of pipelines.
To disable sources validation and data preview use environmental variable VALIDATION_ENABLED
./agent.sh run
AGENTPOD=$(kubectl get pod -l app.kubernetes.io/name=anodot-agent -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $AGENTPOD bash
Where streamsets-agent-0
should be replaced with the actual pod name
- When upgrading to version >=3.14.0:
The user in the agent container was changed from
root
toagent
. It no longer allows processes to run on port 80, so we need to change the listening port for the agent API- Add environmental variable
LISTEN_PORT: 8080
to the agent container. (a docker-compose example is in the installation section) - For all StreamSets do
agent streamsets edit STREAMSETS_URL
and for the agent URL change the port to 8080. (this will cause all pipelines to be updated)
- Add environmental variable
- When upgrading from a version
>2.0.1
:- Sequentially run scripts from
src/agent/scripts/upgrade/
directory using the commanddocker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name>
(don't run scripts which version is less or equal your current agent version) - Run
docker exec -i anodot-agent agent pipeline update
- Sequentially run scripts from
- When upgrading from a version
<2.0.0
:- Upgrade to the 1.18.1 first
- Install a Postgres database alongside the agent (refer to the docker-compose or Kubernetes installation instructions)
- Run
docker exec -i anodot-agent python src/agent/scripts/migrate-to-db.py
- Sequentially run scripts from
src/agent/scripts/upgrade/
directory using the commanddocker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name>
(don't run scripts which version is less or equal your current agent version) - Run
docker exec -i anodot-agent agent pipeline update
- If you upgrade from a version
<1.15.0
executeagent destination
command before runningagent update
- If you upgrade from a version
<1.6.0
Kafka pipelines will be deprecated. They will still be running but you won't be able to update them. You will need to delete pipelines, delete sources and recreate them with the new config according to the documentation
In order to upgrade the agent you should make such steps depending on the way the agent was installed:
./agent.sh upgrade
- Set version tag for both images
- Apply Kubernetes config
- Attach to the agent container
- Run
agent pipeline update
- Add a StreamSets instance
root@agent:/usr/src/app# agent streamsets add
Enter streamsets url: http://dc:18630
Username [admin]:
Password [admin]:
Agent external URL [http://anodot-agent]: http://anodot-agent:8080
- Create a destination.
> agent destination
Use proxy for connecting to Anodot? [y/N]: y
Proxy uri: http://squid:3181
Proxy username []:
Proxy password []:
Destination url [https://api.anodot.com]: https://api.anodot.com
Anodot data collection token: tokenhere
Anodot access key: apikey
Destination configured
You can connect to the Anodot application using a proxy. To do that, specify proxy URI, username and password.
Destination URL is a URL of an Anodot application where all data will be transferred.
To get an Anodot data collection token, open your Anodot account, go to Settings > API tokens > Data Collection > Copy.
Add Anodot access key. Please follow the instructions to get it.
After the destination is created you can check the Monitoring pipeline is running and monitoring data is being passed to Anodot
- Create a source
agent source create -f /path/to/source/config.json
- Create a pipeline
agent pipeline create -f /path/to/pipeline/config.json
- Run the pipeline
agent pipeline start PIPELINE_ID
- Check pipeline status
agent pipeline info PIPELINE_ID
- If errors occur - check the troubleshooting section
- Fix errors
- Stop the pipeline
agent pipeline stop PIPELINE_ID
- Reset pipeline origin
agent pipeline reset PIPELINE_ID
- Run pipeline again
Pipelines may not work as expected for several reasons, for example, because of a wrong configuration, or some issues connecting to the destination, etc. You can check for errors in such places:
-
agent pipeline info PIPELINE_ID
- This command will show some issues if a pipeline is misconfigured -
agent pipeline logs -s ERROR PIPELINE_ID
- shows error logs if there are any
docker logs anodot-sdc
docker logs anodot-agent
docker exec -i anodot-agent cat /var/log/agent.log
-
It's possible to enable logging of requests to Anodot and see the exact data being sent.
- Stop the pipeline
agent pipeline stop PIPELINE_ID
- Enable logging
agent pipeline destination-logs --enable PIPELINE_ID
- Start the pipeline
agent pipeline start PIPELINE_ID
- See logs
destination_logs
- After troubleshooting stop the pipeline and disable logs because they consume a lot of space
agent pipeline destination-logs --disable PIPELINE_ID
- Stop the pipeline
-
If you're having an issue, please contact [email protected]. To help us to resolve the issue faster, please send us agent logs package. You can generate it with
./agent.sh diagnostics-info
if you're using docker-compose installation (if this command is not available please download the latest script here agent.zip). If the agent is installed in kubernetes cluster please download and run this shell script
No, you need to delete the pipeline and create a new one with a new name
If your data is not in UTC, when configuring the pipeline you should specify a timezone. Example:
[
{
"source": "test",
"pipeline_id": "test",
...
"timestamp": {
"type": "string",
"name": "timestamp",
"format": "yyyy-MM-dd HH:mm:ss"
},
"timezone": "Asia/Dubai"
}
]
You should use tz database names (like Asia/Dubai, Europe/London etc.) instead of offset numbers (like GMT+05:00) so daylight saving times will be handled automatically. List of all tz database names you can find here
- Home
- CLI reference
- API
- Kubernetes setup using Helm
- Podman setup
- Creating pipelines
- Test sources
- Data formats (JSON, CSV, AVRO, LOG)
- How to parse logs with grok patterns
- How to store sensitive information
- Automated pipelines creation
- Filtering
- Transformation files
- Fields
- DVP Configuration
- Integrations
- Sending events to Anodot