Utilities for Insights Results Aggregator
- Utilities for accessing REST API endpoints for selected services
- Utilities for handling messages to be consumed by aggregator
- Utilitites for generating reports etc.
- Utilities for working with objects stored in AWS S3 bucket
- Monitoring tools
- Checking tools
- Database related tools
- Package manifest
These utilities are stored in api_access
subdirectory.
BASH script to retrieve results for multiple clusters (specified in URL) from the Insights Results Aggregator service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Insights Result Aggregator service instance.
BASH script to retrieve results for multiple clusters (specified in request payload) from the Insights Results Aggregator service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Insights Result Aggregator service instance.
BASH script to retrieve results for multiple clusters (specified in URL) from the Smart Proxy service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Smart Proxy service instance.
BASH script to retrieve results for multiple clusters (specified in request payload) from the Smart Proxy service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Smart Proxy service instance.
These utilities are stored in input
subdirectory.
Anonymize input data produced by OCP rules engine.
All input files that ends with '.json' are read by this script and if they contain 'info' key, the value stored under this key is replaced by empty list, because these informations might contain sensitive data. Output file names are in format 's_number.json', ie. the original file name is not preserved as it also might contain sensitive data.
python3 anonymize.py
Converts outputs from OCP rule engine into proper reports.
All input files that with filename 's_*.json' (usually anonymized outputs from OCP rule engine' are converted into proper 'report' that can be:
- Published into Kafka topic
- Stored directly into aggregator database
It is done by inserting organization ID, clusterName and lastChecked attributes and by rearanging output structure. Output files will have following names: 'r_*.json'.
python3 2report.py
This script can be used to fill in the aggregator database in the selected pipeline with data taken from test clusters.
The script performs several operations:
- Decompress input data generated by Insights operator and stored in Ceph/AWS bucket, update directory structure accordingly
- Run Insights OCP rules against all input data
- Anonymize OCP rules results
- Convert OCP rules results into a form compatible with aggregator. These results (JSONs) can be published into Kafka using
produce.sh
(several times if needed)
./fill_in_results.sh archive.tar.bz org_id cluster_name
./fill_in_results.sh external-rules-archives-2020-03-31.tar 11789772 5d5892d3-1f74-4ccf-91af-548dfc9767aa
Generates messages to be consumed by Insights Results Aggregator.
This script read input message (that should be correct or incorrect, according to needs) and generates bunch of new messages derived from input one. Each generated message can be updated if needed - Org ID can changed, cluster ID can changed as well etc.
Types of possible input message modification: * Org ID (if enabled by CLI flag -g) * Account number (if enabled by CLI flag -a) * Cluster ID (if enabled by CLI flag -c)
It is also possible to specify pattern for output message filenames. For example:
generated_message_{}.json
usage: gen_messages.py [-h] [-i INPUT] [-o OUTPUT] [-r REPEAT] [-g] [-a] [-c] [-v]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Specification of input file
-o OUTPUT, -output OUTPUT
Specification of pattern of output file names
-r REPEAT, --repeat REPEAT
Number of generated files
-g, --org-id Enable organization ID modification
-a, --account-number Enable account number modification
-c, --cluster-id Enable cluster ID modification
-v, --verbose Make messages verbose
This script read input message (that should be correct) and generates bunch of new messages.
Each generated message is broken in some way so it is possible to use such messages to test how broken messages are handled on aggregator (ie. consumer) side.
Types of input message mutation:
- any item (identified by its key) can be removed
- new items with random key and content can be added
- any item can be replaced by new random content
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/gen_broken_messages.html
python gen_broken_messages.py input_file.json
This script read input message (that should be correct) and generates bunch of new messages.
Each generated message is broken - it does not contain proper JSON object - to test how broken messages are handled on aggregator (ie. consumer) side.
Types of input message mutation:
- any item (identified by its key) can be removed
- new items with random key and content can be added
- any item can be replaced by new random content
usage: gen_broken_jsons.py [-h] -i INPUT [-o OUTPUT] [-e EXPORTED] [-v] [-s]
[-a] [-d] [-m] [-ap ADD_LINE_PROBABILITY]
[-dp DELETE_LINE_PROBABILITY]
[-mp MUTATE_LINE_PROBABILITY]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
name of input file
-o OUTPUT, --output OUTPUT
template for output file name (default out_{}.json)
-e EXPORTED, --exported EXPORTED
number of JSONs to be exported (10 by default)
-v, --verbose make it verbose
-s, --shuffle_lines shufffle lines to produce improper JSON
-a, --add_lines add random lines to produce improper JSON
-d, --delete_lines delete randomly selected lines to produce improper
JSON
-m, --mutate_lines mutate lines individually
-ap ADD_LINE_PROBABILITY, --add_line_probability ADD_LINE_PROBABILITY
probability of new line to be added (0-100)
-dp DELETE_LINE_PROBABILITY, --delete_line_probability DELETE_LINE_PROBABILITY
probability of line to be deleted (0-100)
-mp MUTATE_LINE_PROBABILITY, --mutate_line_probability MUTATE_LINE_PROBABILITY
probability of line to be mutate (0-100)
Generator of random payload for testing REST API, message consumers, test frameworks etc.
This source file contains class named RandomPayloadGenerator
that can be
reused by other scripts and tools to generate random payload, useful for
testing, implementing fuzzers etc.
This is a helper class that can't be started directly from the command line. Internally it is used by script gen_broken_messages.py
.
These utilities are stored in reports
subdirectory.
Display statistic about rules that really 'hit' problems on clusters.
This script can be used to display statistic about rules that really 'hit' problems on clusters. Can be used against test data or production data if needed.
To run this tool against all files in current directory that contains test data or production data:
python3 stat.py
Analyze data exported from db-writer
database.
This script can be used to analyze data exported from report
table by
the following command typed into PSQL console:
\copy report to 'reports.csv csv
Script displays two tables: 1. org id + cluster name (list of affected clusters) 2. org id + number of affected clusters (usually the only information reguired by management)
Usage:
affected_clusters.py rule_name input_file.csv
Example:
affected_clusters.py ccx_rules_ocp.external.bug_rules.bug_12345678.report report.csv
Analyze data exported from db-writer
database.
List all rules and other interesting informations found in reports.csv. Data are exported into CSV format so it will be possible to include them in spreadsheets.
This script can be used to analyze data exported from `report` table by
the following command typed into PSQL console:
\copy report to 'reports.csv' csv
Howto connect to PSQL console:
psql -h host
Password can be retrieved from OpenShift console, for example from:
ccx-data-pipeline-qa/browse/secrets/ccx-data-pipeline-db
ccx-data-pipeline-prod/browse/secrets/ccx-data-pipeline-db
Creates plot (graph) displaying statistic about the age of rule results.
Creates plot (graph) displaying statistic about the age of rule results.
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/cluster_results_age.html
python3 cluster_results_age.py input.csv
These utilities are stored in s3
subdirectory.
Script to retrieve timestamp of all objects stored in AWS S3 bucket and export them to CSV.
This script retrieves timestamps of all objects that are stored in AWS S3 bucket and export these timestamps to CSV file. It is possible to specify region (in S3), access key, and secret key.
upload_timestamps.py [-h] -k ACCESS_KEY -s SECRET_KEY [-r REGION]
[-b BUCKET] -o OUTPUT [-m MAX_RECORDS]
optional arguments:
-h, --help show this help message and exit
-k ACCESS_KEY, --access_key ACCESS_KEY
AWS access key ID
-s SECRET_KEY, --secret_key SECRET_KEY
AWS secret access key
-r REGION, --region REGION
AWS region, us-east-1 by default
-b BUCKET, --bucket BUCKET
bucket name, insights-buck-it-openshift by default
-o OUTPUT, --output OUTPUT
output file name
-m MAX_RECORDS, --max_records MAX_RECORDS
max records to export (default=all)
Script to download N tarballs from M clusters from external data pipeline bucket.
This script is used to download tarballs stored in AWS S3 bucket and export the clusters and tarball path to a CSV.
It is fully configurable although it expects the bucket to have a format s3://BUCKET/SUPER_FOLDER/CLUSTER/...
.
❯ cd s3/download_prod_data
❯ go build .
❯ ./download_prod_data --help
Usage of ./download_prod_data:
-access-key string
access key
-bucket string
bucket name
-disable-ssl
whether to disable SSL or not (default false)
-endpoint string
endpoint (leave empty to use AWS)
-n-clusters int
number of clusters (default 1)
-n-tarballs int
number of tarballs per cluster (default 1)
-output string
path to save the CSV file
-prefix string
path to the clusters folders
-region string
bucket region (default "us-east-1")
-secret-key string
secret key
You can also go run main.go
, it is not mandatory to build it.
These utilities are stored in monitoring
subdirectory.
Script to retrieve memory and GC statistic from the standard Go metrics. Memory and GC statistic is being exported into CSV file to be further processed.
usage: go_metrics.py [-h] [-u URL] -o OUTPUT [-d DELAY] [-m MAX_RECORDS]
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL URL to get metrics
-o OUTPUT, --output OUTPUT
output file name
-d DELAY, --delay DELAY
Delay in seconds between records
-m MAX_RECORDS, --max_records MAX_RECORDS
max records to export (default=all)
Plot graph with Kafka lags with linear regression line added into plot.
Source CSV file is to be retrieved from Grafana.
kafka_lags.py input_file.csv
kafka_lags.py overall.csv
These utilities are stored in checks
subdirectory.
Simple checker if all JSONs have the correct syntax (not scheme).
Usage:
```text
usage: json_check.py [-h] [-v]
optional arguments:
-h, --help show this help message and exit
-v, --verbose make it verbose
-n, --no-colors disable color output
-d DIRECTORY, --directory DIRECTORY
directory with JSON files to check
Simple checker for OpenAPI specification files.
usage: open_api_check.py [-h] [-v] [-n] [-d DIRECTORY]
optional arguments:
-h, --help show this help message and exit
-v, --verbose make it verbose
-n, --no-colors disable color output
-d DIRECTORY, --directory DIRECTORY
directory OpenAPI JSON file to check
Anonymize aggregator log files by hashing organization ID and cluster ID. This tool works as a standard Unix filter.
anonymize_aggregator_log.py [-h] -s SALT
optional arguments:
-h, --help show this help message and exit
-s SALT, --salt SALT salt for hashing algorithm
anonymize_aggregator_log.py -s foobar < original.log > anonymized.log
Anonymize CCX data pipeline log files by hashing organization ID and cluster ID. This tool works as a standard Unix filter.
anonymize_ccx_pipeline_log.py [-h] -s SALT < input.log > output.log
optional arguments:
-h, --help show this help message and exit
-s SALT, --salt SALT salt for hashing algorithm
anonymize_ccx_pipeline_log.py -s foobar < original.log > anonymized.log
These utilities are stored in anim
subdirectory.
That subdirectory contains tools to generate various animations with Insights Results Aggregator, Insights Content Service, and Insights Results Smart proxy architecture and data or command flows. Theese tools are invoked from command line and don't not accept any command line argument (yet).
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow for the whole external data pipeline.
Specialized utility used just to create data flow for the whole external data pipeline.
Go version 1.14 or newer is required to build this tool.
go build anim_external_data_pipeline.go
go run anim_external_data_pipeline.go
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow for Insights Results Aggregator consumer service.
Specialized utility used just to create https://github.com/RedHatInsights/insights-results-aggregator/blob/master/docs/assets/anim_aggregator_consumer.gif
Go version 1.14 or newer is required to build this tool.
go build anim_aggregator_consumer.go
go run anim_aggregator_consumer.go
Creates animation based on static GIF image + set of programmed rules. That animation displays data flow between Insights Results Smart Proxy and other services (internal and external ones).
Specialized utility used just to create https://redhatinsights.github.io/insights-content-service/architecture/architecture.gif
Go version 1.14 or newer is required to build this tool.
go build anim_smart_proxy.go
go run anim_smart_proxy.go
Creates animation from static GIF image + set of programmed rules.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-only.gif animation
Go version 1.14 or newer is required to build this tool.
go build insights_operator_pull_only.go
go run insights_operator_pull_only.go
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow from Insights Operator to OCP WebConsole via Prometheus metrics.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-prometheus-anim.gif animation
Go version 1.14 or newer is required to build this tool.
go build insights_operator_prometheus.go
go run insights_operator_prometheus.go
Creates animation from static GIF image + set of programmed rules.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-prometheus-anim.gif animation
Go version 1.14 or newer is required to build this tool.
go build insights_operator_to_web_console.go
go run insights_operator_to_web_console.go
Simple checker of all Python sources in the given directory (usually repository).
This script tries to find all files in current directory and subdirectories with '*.py' extension. Then it checks all those files for any style violations. Each violation is printed and then total errors is displayed as well.
To check all files in current directory and all subdirectories:
python3 run_pycodestyle.py
These utilities are stored in converters
subdirectory.
Converts structured data from JSON format into EDN format.
Converts structured data from JSON format into EDN format. This script is based
on edn_format
Python package, that needs to be installed by using pip
or
pip3
.
python3 json2edn.py input.json > output.edn
Converts structured data from EDN format into JSON format.
Converts structured data from EDN format into JSON format. This script is based
on edn_format
Python package, that needs to be installed by using pip
or
pip3
.
python3 edn2json.py input.edn > output.json
This script can be used to perform several operations with external data pipeline usually deployed on Stage environment and accessible through proxy server.
First operation retrieves list of clusters from the external data pipeline
through the standard REST API (and optionally via proxy server). Organization
ID needs to be provided via CLI option, because list of clusters is filtered by
organization. This operation is selected by using -l
command line option.
Second operation retrieves results from the external data pipeline for several
clusters. List of clusters needs to be stored in a plain text file. Name of
this text file is to be provided by -i
command line option. This operation is
selected by using -r
command line option.
Third operation compares two sets of results. Each set needs to be stored in
separate directory. CSV file with detailed comparison of such two sets is
generated during this operation. This operation is selected by using -c
command line option.
Fourth operation retrieves processing timestamp for both set of results and stores these timestamps into CSV files for further analysis.
REST API on Stage environment is accessed through proxy. Proxy name should be provided via CLI together with user name and password used for basic auth.
st.py [-h] [-a ADDRESS] [-x PROXY] [-u USER] [-p PASSWORD]
[-o ORGANIZATION] [-l] [-r] [-i INPUT] [-c] [-d1 DIRECTORY1]
[-d2 DIRECTORY2] [-e EXPORT_FILE_NAME] [-d] [-v] [-t]
optional arguments:
-h, --help show this help message and exit
-a ADDRESS, --address ADDRESS
Address of REST API for external data pipeline
-x PROXY, --proxy PROXY
Proxy to be used to access REST API
-u USER, --user USER User name for basic authentication
-p PASSWORD, --password PASSWORD
Password for basic authentication
-o ORGANIZATION, --organization ORGANIZATION
Organization ID
-l, --cluster-list Operation to retrieve list of clusters via REST API
-r, --retrieve-results
Retrieve results for given list of clusters via REST
API
-t, --export-times Export processing times to CSV files that can be used
for further analysis
-i INPUT, --input INPUT
Specification of input file (with list of clusters,
for example)
-c, --compare-results
Compare two sets of results, each set stored in its
own directory
-d1 DIRECTORY1, --directory1 DIRECTORY1
First directory containing set of results
-d2 DIRECTORY2, --directory2 DIRECTORY2
Second directory containing set of results
-e EXPORT_FILE_NAME, --export EXPORT_FILE_NAME
Name of CSV file with exported comparison results
-d, --additional-info
Add additional info about data pipeline components
into CSV report
-v, --verbose Make messages verbose
- Retrieve list of clusters via REST API for organization ID 12345678
st.py -l -a https://$REST_API_URL -x http://$PROXY_URL -u $USER_NAME -p $PASSWORD -o 12345678
- Read results for clusters whose IDs are stored in file named
clusters.txt
st.py -r -a https://$REST_API_URL -x http://$PROXY_URL -u $USER_NAME -p $PASSWORD -i clusters.txt
- Export processing timestamps into CSV files
st.py -t -d1=c1 -d2=c2
- Compare results stored in directories
c1
andc
, results w/o info about the pipeline
st.py -c -d1=c1 -d2=c2
- Compare results stored in directories
c1
andc
, results with info about the pipeline
st.py -c -v -d1=c1 -d2=c2 -a https://$REST_API_URL -x http://$PROXY_URL -u $USER_NAME -p $PASSWORD
Script to retrieve and analyze processing times from reports taken from external data pipeline
usage: pta.py [-h] -i INPUT_FILE [-v]
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input INPUT_FILE
Specification of input file (with list of clusters,
for example)
-b BIN_SIZE, --bin-size BIN_SIZE
Bin size for histograms
-v, --verbose Make messages verbose
pta.py -i times.csv -v
pta.py -i times.csv -v -b 100
Prepares script to cleanup old results from database.
This script can be used to analyze data exported from report
table by
the following command typed into PSQL console:
\copy report to 'reports.csv' csv
Script retrieves all reports older than the specified amount of time represented as days. Then it creates an SQL script that can be run by administrator against selected database.
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/cleanup_old_results.html
Howto connect to PSQL console:
psql -h host
Password can be retrieved from OpenShift console, for example from: ccx-data-pipeline-qa/browse/secrets/ccx-data-pipeline-db ccx-data-pipeline-prod/browse/secrets/ccx-data-pipeline-db
cleanup_old_results.py offset_in_days input_file.csv > cleanup.sql
create a script to cleanup all records older than 90 days
cleanup_old_results.py 90 report.csv > cleanup.sql
Package manifest is available at docs/manifest.txt.