This is a visualization framework for online performance analysis. This framework mainly focuses on visualizing real-time anomalous behaviors in a High Performance Computing application so that any patterns of anomalies that users might not have recognized can be effectively detected through online visual analytics.
This framework provides four major features:
- Streaming data reduction and aggregation
- Sliding time window of workflow overview for regular and anomaly function executions
- Selected function execution in a reduced call stack tree structure
- Selected function execution and message passing details in zoomable timelines
- Overall events distribution by rank
The following four visualization components are provided:
- Projection of Function Execution
- Selected Call Stack Tree
- Timeline and Message Visualization
- Execution Distribution by Rank
This framework is a web application for which the back-end was built with Python3.x
and Flask
and the front-end was developed with Javascript
and D3.js
.
Please be advised that this framework is containerized as a single docker
image so that there is no need to spend time on any issues with the software dependency.
The docker
repository is currently private. The access information will be added soon.
In order to manually install this framework, make sure that Python3.x
and pip3
are installed on your machine.
If they are already installed, Numpy
and Flask
can be installed by:
$ pip3 install Numpy
$ pip3 install Flask
Now, please clone the repository of this project to run the visualization framework by:
$ git clone [TBA]
Before starting a visualization server, a port number can be modified if needed. Please open main.py
:
port = 5000 # replace 5000 with your preference
Start visualization server:
$ python3 main.py
If the visualization server runs, by default, localhost:5000
will work.
The visualization server accepts POST
method requests to localhost:5000/events
with data in the json
header. Please send the following requests in the following order.
- Reset/initialize the server:
{
"type":"reset"
}
- Provide function names in order
{
"type":"functions",
"value": ["function", "names", "in", "string", "type"]
}
- Provide events types in order
{
"type":"event_types",
"value": ["event", "types", "in", "string", "type"]
}
- Provide trace information with the function of interest
foi
and anomaly functionslabels
:
{
"type":"info",
"value":{
"events": [],
"foi": [], // a list of indices based on function names
"labels": [] // a list of lineid
}
}
Currently, parserNWChem.py
can be used for an offline demo.
$ cd web
$ python3 parserNWChem.py
Trace events are visualized on a scatter plot in a streaming fashion. If some events were detected as an anomaly, they will be highlighted by making the size bigger than normal executions.
More visual patterns can be recognized by adjusting the axis. For example: Rank vs. Execution time, Entry time vs. Exit time, etc.
The various filtering is also provided so that particular functions or only anomaly data can be easily observed.
For a particular function execution, clicking the specific data point on the scatter plot will show detailed call stack information.
The detailed timeline and message communication of a specific function can also be visualized. Dragging the timeline will focus on a particular interval.
The overall distribution of executions can also be identified. Moreover, domain scientists can easily identify which processor was problematic at which moment by visualizaing the amount of anomalies per processor.
We support the automated unit-test to make sure that the basic functionalities are correctly working as we expect. Please try the following commands to build an automated script first:
$ cd [PATH/TO/ROOT/OF/PROJECT]
$ make
After creating the script, the unit-test can be run by:
$ make test
Report of the test cases will be printed:
----------------------------------------------------------------------
Ran 4 tests in 14.663s
OK