This repository contains configuration files and static web files for the visualization hub of the GROS project. The proxy is intended to provide access to a Ghost blog and Discourse forum if they are deployed within the environment, as well as to the visualization reports for the organization, which were generated using branch-based builds on a Jenkins instance.
The repository needs to be configured to work within a specific environment. Two stages of reverse proxies are used to safely provide access to the GROS webservers (separate Docker instances, a Jenkins setup and/or direct access). The reverse proxies are as follows:
- Caddy for transparent proxy access from a BigBoat dashboard. Several subinstances handle specific domain names. This is only used in an environment where access has to be routed through several VLANs.
- NGINX for proxy access and/or static file hosting from a central server listening on specific ports. It can also use Host-based proxying. Complete, direct hosting is possible but may require additional, separate server configuration.
- Apache as an alternative to NGINX, for proxy access and/or static file hosting from a central server listening on specific ports. It can also use Host-based proxying. Complete, direct hosting is possible but may require additional, separate server configuration.
Many of the configuration files and web files use Mustache in order to use configuration items and functions to adjust to other environments.
Additionally, this repository contains a Shell script goaccess-report.sh
that
can be run periodically to generate a server statistics report. It requires
installation of GoAccess which analyzes logs and
creates an analytics dashboard.
Separate documentation exists for more details on how the second proxy layer works.
While the build of the visualization site may work with simply having Docker installed, the test and production environments will need a more full-fledged installation of other dependencies.
For the tests, a Jenkins installation is assumed with proper OpenJDK Java 8+
and Python 3.6+ (including distutils
and virtualenv
). The agent that
performs the tests must have Docker Compose V2. In addition, tools such as
Bash, Git, jq
, awk
, sed
, grep
and xargs
must be available and
GNU-like (POSIX-compliant may not be enough). A SonarQube Scanner must be
registered in Jenkins. The server agent that performs the publishing must have
Docker Compose V2, Node.js, curl
and rsync
as well. Details for configuring
Jenkins servers and agents are outside the scope of this documentation,
although some details may be available in other GROS documentation. The
integration tests may be able to be run outside a Jenkins job, but support for
this is limited and it requires passing additional environment variables.
The deployment assumes an NGINX or Apache service and potentially a Docker Compose installation some which can be set up using (some of) the configuration files that we build from the templates in this repository.
This repository contains integration tests for the visualizations. The entire
visualization proxy layer setup is emulated within a docker-compose
network,
with each deployed visualization within its own Docker instance running its
Node.js development web service. A Caddy proxy pretends to be the Jenkins
server with access to the HTML reports and artifacts.
The tests themselves are run with a Selenium instance using a Chromium browser, which is being controlled by a Python runner. The tests are in the form of Python unit tests which instruct the Selenium instance to navigate to pages and perform actions such as clicking. The unit tests then check if certain elements are visible, contain certain text, or have other properties, and the test passes if this is the case.
The visualizations use their test builds which contain coverage tracking, which is extracted and combined after the test completes. In addition to coverage, Sonar scans check for code smells, and a dependency check searches for vulnerabilities.
At the end of each test, more actions take place. Logs are stored and a screenshot is made of the page, so that there is a visual reference of the state of the page in case a test fails. Finally, an accessibility testing engine is used to verify if the contents of the page conform to various WCAG rules. All of these results are combined as well into a report.
The visualization dashboard, connections to other servers as well as common
elements such as a navigation bar, are all configured in this repository. The
main configuration point is usually the config.json
file, which can be copied
from lib/config.json
to the root of the repository in order to make local
adjustments. The following configuration items (all strings, unless noted
otherwise) are known:
visualization_url
: The URL to the visualizations. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the visualizations, where the dashboard is found and every visualization (except predictions) has sub-paths below it.prediction_url
: The URL to the prediction site. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the predictions (without any specific paths below it).base_url
: The absolute URL to the base website where other static resources are located. This is used to link to the base website in the navigation bar and to make certain URLs absolute like schema identifiers and canonical URLs. If it is left empty, then some of those URLs will not have a protocol and domain name or use the current location as base.blog_url
: The URL to the blog. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the blog. If this is empty, then no blog is available.discussion_url
: The URL to the discussion forum. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the forum. If this is empty, then no forum is available.download_url
: The URL to default download links on the dashboard. This can be overwritten by specific download URLs by the visualizations and an altered value may work with the redirections provided by the NGINX or Apache proxy, so care should be taken when adjusting this. This may be helpful when direct access to downloads are available. If this is empty, then no download links are shown on the dashboard.jira_url
: URL that is used in some navigation bars to link to a Jira instance.blog_host
: Domain name of an internal server where the blog is hosted.blog_server
: Domain name of an external server that provides access to the blog. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.discussion_host
: Domain name of an internal server where the discussion forum is hosted.discussion_server
: Domain name of an external server that provides access to the discussion forum. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.visualization_server
: Domain name of an external server that provides access to the visualization dashboard and every visualization (except predictions). This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.www_server
: Domain name of an external server that listens on a "www" address. This server redirects to the visualization server. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the redirection service available.prediction_server
: Domain name of an external server that provides access to the predictions. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.hub_organizations
(array): When multiple organizations are hosted in the same environment, an array of objects containing organizations, navigation data and branch names can be added to make them available cross-builds. For tests, they provide a build and serve dummy data for visualizations on multiple branches. The test routes only consider thevisualization-site
andprediction-site
as keys in each object, and their values should provide a branch name that thevisualization_branch
andprediction_branch
may respectively point to for NGINX testing, or the branch names generated usinghub_mapping
for Apache testing. The organizations objects in this array also define how items in a navigation bar dropdown should appear, with titles, content/alternative text locales, images (plus width/height/style attributes) and external URLs. Finally, a keyvisualizations
may have an array value with the visualizations that the organization's hub hosts.hub_regex
: When multiple organizations are hosted in the same environment, a regular expression can be used to match the organization name which must occur at the start of the path, and place the matched parts into variables for later use in rewrite rules of NGINX or Apache. Use(?<groupname>...)
for capturing matches for compatibility across proxy servers. To avoid renumbering issues, this regular expression should not contain unnamed match captures(...)
.hub_mapping
(object): Groups of environment variable names to replace the matched substrings from the regular expression fromhub_regex
in. Only used for Apache, when hosting multiple organizations in the same environment. Group keys can be "hub", "visualization" and "prediction", corresponding to the visualization-site itself, the visualizations and the prediction-site, respectively. Each variable within the group has an object with "input", "default", and "output" keys, which are used to build aRewriteMap
. The input and default can refer to other variables with dollar signs. The output mapping cannot contain spaces or empty strings in both the keys and values. In addition to thehub_regex
variables, the "prediction" group may also transform abranch_organization
variable to map paths to organizations in (combined) prediction setups. The mapping is also used for determining the available prediction path setups on the server for the OpenAPI specification, and for copying the published visualizations on master branches to the proper default organizations. In NGINX, one can instead alter variables captured from regular expressions usinghub_branch
,visualization_branch
andprediction_branch
.branch_maps_path
: Filesystem path where the rewrite maps for the Apache configuration are expected to reside. The map files are written tohttpd/maps
in this repository, but may be mapped to another absolute path. This affects the path within the Docker instance during thedocker-compose
paths, but can also be helpful when using (portions of) the configuration on an Apache server.hub_redirect
: When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path usinghub_regex
can be used in a rewrite that redirects to another URL. It is assumed that this configuration value produces an absolute URL. For NGINX, use dollar-sign variables; for Apache, use environment variables named withhub_mapping
with%{...}
syntax.hub_branch
: Inject some processing steps in the NGINX configuration for the visualizations hub. This should at least determine the branch of a Jenkins build to use for the visualization site. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path usinghub_regex
can be used in further processing.visualization_branch
: Inject some processing steps in the NGINX configuration for a visualization. This should at least determine the branch of a Jenkins build to use for a visualization. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path usinghub_regex
can be used in further processing.prediction_branch
: Inject some processing steps in the NGINX configuration for the prediction site. This should at least determine the branch of a Jenkins build to use for the visualization or prediction site. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path usinghub_regex
can be used in further processing.jenkins_host
: Domain name of an internal server where the Jenkins build system is hosted.jenkins_path
: Path that the Jenkins build system is hosted below. This is useful in situations where Jenkins is hosted on the same domain as other resources and is therefore within a path rather than directly at the root of a domain.jenkins_direct
: Filesystem path of a location where a copy of the visualizations and predictions are available. If this is not empty, the structure within the path must follow that which thecopy.sh
script makes and supported requests are internally rewritten to this path. If this is empty, then requests for visualizations and predictions are proxied to the Jenkins server.jenkins_direct_url
: URL through which the Jenkins server is available from the location of the build (a Jenkins node with the 'publish' tag, most likely the server itself), when the visualizations and predictions are copied.jenkins_direct_cert
: Filesystem path to a certificate used for validating a HTTPS connection to the Jenkins server, when the visualizations and predictions are copied (for branch information).jenkins_api_token
: Encrypted token that can be used for basic authorization against the Jenkins API for at least branch details of builds.files_host
: Domain name of an internal server where an ownCloud instance is hosted.files_share_id
: Identifier of a published share on an ownCloud instance with files that are made available in addition to the prediction resources. If this is an empty string, then the paths to the files list and specific files are not passed through to ownCloud from the NGINX or Apache proxy.control_host
: Domain name of an internal server where secure resources are hosted, including encryption services and access control checks. This domain must be accessible through HTTPS from the NGINX or Apache proxy. This can be set to an empty string to disable proxy connections to endpoints for encrypting names of developers and checking project group access, as used in some visualizations. An empty string also makes the Caddy proxy configuration unusable, so only leave this empty if a portion of the configuration is used.websocket_server
: Domain name of an external server where a WebSocket for real-time updates of access log analytics is hosted. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the WebSocket service available (via the GoAccess script).proxy_nginx
(boolean): Whether to use NGINX to provide access to the other servers or host the direct files. If set to false, use Apache HTTP Server instead. Affects the test as well as which configuration is generated.proxy_range
: CIDR range of trusted IP addresses that may host the first layer of proxies in front of the NGINX or Apache proxy, for example the Caddy proxies. Requests from these addresses may provide headers with the real IP address of the original request, which are used instead of the proxy's IP address.proxy_port_in_redirect
(boolean): Whether to make the NGINX or Apache proxy specify the port number when a redirect is generated. If a Caddy proxy is in front of it, then this is most likely not wanted. Similarly, if only portions of the configuration files are used, then this is also not useful to enable.auth_cert
: Filesystem path to a certificate used for validating the HTTPS connection to thecontrol_host
. If a$SERVER_CERTIFICATE
environment is not set, then this is also used as the path on the Docker host machine during thedocker-compose
tests to provide this path within the Docker instance.allow_range
(array): CIDR ranges of IP addresses that are allowed to access the access log analytics.goaccess_path
: Filesystem path to the location where the GoAccess interface is hosted. The directory should contain ananalytics
subpath which holds the actual web interface. This can be set to an empty string to disable the GoAccess endpoint.goaccess_log_path
: Filesystem path to log file location. Access logs stored in this path (including rotated and possibly GZip-encoded logs) are followed.swagger_openapi_url
: URL prefix to use for the OpenAPI specification files that are made available in the Swagger UI. The default value of./
works well for the Docker instance, but if the OpenAPI files are meant to be stored elsewhere then another path is possible. For production builds of a static file hosting setup withjenkins_direct
, this value is set to the absolute path of the direct hosting root, and additional OpenAPI files rather than just the prediction API are made available.swagger_validator_url
: URL to use to connect to the validator. The default value of/validator
works well for the Docker compose network. If it is set to an empty string, then the online validator is used. Can be set tonone
to disable the validator. For production builds of a static file hosting setup withjenkins_direct
, we use the online validator.
Configuration items that have keys ending in _url
may be processed to direct
toward an organization-specific path, in case multiple organizations are hosted
in the same environment. The value is searched for the substring
$organization
, possibly after slashes. These can be replaced with the actual
organization that the build is for. In some cases, it is removed only to allow
NGINX or Apache rules to add a supported variant of it in front of the path
using hub_regex
, and in other cases, it may be preserved to be adjusted at
a later moment within the visualization or hub. Specific environment
variables determine how the substring $organization
is replaced.
Note that configuration items that have keys ending in _host
or _server
may
be set to "fake" values when only a portion of the final NGINX or Apache
configuration is actually used, for example when some resources are not made
available. The configuration values should still be set to valid domain names
so that they can be used within the docker-compose
network during the tests.
Several environment variables may be used when calling npm run
or the Shell
scripts, and may affect various contexts, such as resource paths in the HTML
pages, the JavaScript-based navigation bar, the proxy server configuration and
the test environment. Boolean environment variables must be set to true
in
order to take effect.
$VISUALIZATION_ORGANIZATION
: Determines the organization to use within the URLs defined in the configuration.$VISUALIZATION_COMBINED
(boolean): Determines whether to replace the organization within the URL with/combined
.$NAVBAR_SCOPE
: Determines the navigation bar override file to use. If the filenavbar.$NAVBAR_SCOPE.js
exists, then it is loaded in place of the generatednavbar.json
file. See files for more details.$VISUALIZATION_ANONYMIZED
(boolean): Determines whether to show a notification on the HTML page indicating that the visualizations contain anonymized data and that some functionality is limited or missing.$VISUALIZATION_SITE_CONFIGURATION
: Path to the main configuration file, defaulting toconfig.json
. If this file does not exist, thenlib/config.json
is used.$PREDICTION_CONFIGURATION
: Path to the configuration file of theprediction-site
visualization, which is then used during tests. By default, the prediction's default configuration is used, which may be incompatible in some complex combined/organization setups. Other visualizations are tested with default configuration.$VISUALIZATION_NAMES
: Optional space-separated list of repository names of visualizations (excludingvisualization-site
itself) that should be included in the navigation bar, proxy configuration, test environment setup, and publishing phase. By default, all visualizations listed in the filevisualizations.json
are considered.$VISUALIZATION_MAX_SECONDS
(integer): Number of seconds to wait before are visualizations are compiled and made available during the test setup. By default, 60 seconds is allocated, which may be too short for some nodes.$REPO_ROOT
: Directory to store the Git repositories of the visualizations during the test setup. Relative to the current directory. By default,repos
is created as a subdirectory. If another directory is used, then existing clones are left as-is with no up-to-dateness checks and dependencies are backed up so that they do not interfere with the test setup.$SERVER_CERTIFICATE
: Path to the HTTPS certificate to use in the test setup when requesting upstream resources for encryption and access checks. By default, this is theauth_cert
from the configuration, but this can be set to use a different path at the host device than in the Docker test setup.$PUBLISH_PRODUCTION
: A boolean variable which indicates whether to perform a copy for potential publication from the current feature branch instead of only doing so when on a main branch. Note that main branches are skipped in this case, so the publication may not have organization-specific paths.$JENKINS_HOME
: Provided by Jenkins and used by thecopy.sh
script as the root from which to collect artifacts and visualization HTML reports for publishing.$BRANCH_NAME
: Provided by Jenkins Multibranch Pipeline and used by the test environment in order to separate Docker resources when building dependencies for visualizations under test.$BUILD_NUMBER
,$BUILD_TAG
,$BUILD_URL
,$NODE_NAME
: Provided by Jenkins and used by the test environment in order to track build context for reporting and naming purposes.
Whereas the configuration file, usually located at config.json
, is likely
necessary to be copied (from lib/config.json
) and changed, there are other
files within the repository that can be modified to adjust what is available on
the visualization site. We will briefly introduce these files, as adjustments
should be considered more like code changes.
The navigation bar is configured in navbar.json.mustache
, with optional
contextual overrides in navbar.$NAVBAR_SCOPE.js
. The format is defined in the
@gros/visualization-ui
package for the Navbar
class, with the addition that
URLs can have $organization
substrings replaced, and various Mustache
operations can take place to fill the navigation bar with menus/links defined
elsewhere, such as visualizations and organizations. navbar.$NAVBAR_SCOPE.js
should, if used, refer to navbar.json
and use simple JavaScript operations to
augment it, though the use of this file may not be necessary when specifics for
an organization can be set by changing other files, such as config.json
and
visualizations.json
.
The available visualizations are configured in visualizations.json
. The
structure is based on the layout of the dashboard, but the items defined in it
also determine which visualizations are actually made available in the
navigation bar, proxy server configuration as well as within the tests. The
JSON object is used as a Mustache structure within the index template and the
navigation bar, and so they may contain Mustache items to refer to certain
configuration items, such as URLs.
Localization of the visualization site is in lib/locales.json
. The messages
in it are used when referred from JavaScript code using the Locales
class
from the @gros/visualization-ui
package, or when using the data-message
attribute within the HTML templates.
The visualization can be built using Node.js and npm
by running npm install
and then either npm run watch
to start a development server that also
refreshes browsers upon code changes, or npm run production
to create
a minimized bundle. The resulting HTML, CSS and JavaScript is made available in
the www
directory.
As mentioned in the dependencies, this repository also
contains a Dockerfile
specification for a Docker image that can perform the
installation of the app and dependencies, which allows building the
visualization site within there, removing the need for a global Node.js
installation. The Jenkinsfile
contains appropriate steps for a Jenkins CI
deployment, including the tests, visualization building and publishing.
During the build, JavaScript in order to construct a common navigation bar is
separated from the main JavaScript bundle in vendor.js
. The visualizations
and prediction site also refer to this file to display the navigation bar.
A non-static production environment can make use of the generated proxy server
configuration in order to deploy the reverse proxy layer(s) that allow access
to all the visualizations and other resources. Optionally, the caddy
docker
compose file can be set up. Next, either the main nginx.conf
plus the files
generated in the nginx
directory can be supplied to the NGINX service, or
likewise with httpd.confg
and the generated files in the httpd
directory
for the Apache service. Alternatively, a subset of these files may be used, for
example to host all under one domain, with additional local configuration.
Documentation based on JSON schema specification files from various GROS
repositories can be built with the doc
directory. This requires installing
the some Python dependencies, preferably first setting up a virtualenv
, then
running pip install -r doc/requirements.txt
. The file doc/source/conf.py
contains configuration for the documentation, including paths or URLs to the
schema files, which can be altered to use local paths. When run on the Jenkins
server to create static documentation, these paths may be defined through the
doc.sh
script which locates the JSON schemas as part of archives and modules
available in the current workspace and beyond.
A static production environment can make use of the result of the Bash script
copy.sh
when run on the Jenkins server in order to create a document root
directory with the organizational hubs, all the visualizations, JSON schemas,
OpenAPI specifications and Swagger UI available for publishing on a static file
hosting server. This server may make use of the NGINX or Apache configurations,
which in their compiled form properly route the prediction site and error
pages, among others. Like other parts, the copy.sh
script requires specific
configuration for mapping hub paths, selecting organizations,
producing paths and URLs and accessing Jenkins for the publishable
visualizations, archived files and prediction branches.