Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influxdb configurable prefix #2586

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jktjkt
Copy link
Contributor

@jktjkt jktjkt commented Aug 3, 2023

I'm feeding data from various weather sensors to VictoriaMetrics (and Grafana). I did not like metric names that are based on free-text model names because they make it harder to plot, say, temperature across multiple sensor types. Without this patch, I might get multiple metrics about, say, temperature:

  • Fineoffset-WS90_temperature_C,
  • Fineoffset-WN34_temperature_C.

Since this project already has a concept of unified property names, I would prefer to see metrics named like rtl433_temperature_C which would be further parametrized ("labels" in PromQL/Grafana terms) by the protocol (-M proto), sensor ID, channel #, etc.

This patch implements just that. I ended up making the pattern format string customizable a la MQTT which is one step further than what I needed, but it came essentially free, so why not. That means that I can now run rtl_433 this like:

rtl_433 -f 868.35M -M time:utc -s 300k -F 'influx://[::1]:8428/write?.,metric=rtl433' \
  -M protocol -M level -R 142 -R 221 -R 244 -R 113 -Y autolevel

...and I will get nice, unified metric names which enable me to plot a graph like this with a single metric, despite the fact that it's aggregating battery health from three different HW types:

image

Here's how this PR implements that:

  1. fix -Wstringop-truncation which would silenty truncate URLs longer than 399 characters (because my compiler bugs me about this)
  2. make the MQTT formatter reusable
  3. use the MQTT formatter by adding the metric option to the inlfux:// reporter
  4. add some additional metadata like protocol to PromQL/MetricsQL labels

There was no buffer overflow, but that use of `strncpy` can lead to
silent truncation of URLs. Let's fail loudly.
Previously, this code would simply take the "device model" and push that
as the "measurement name" to Influx. Almost all of the extracted data
was then added as key-value pairs. This meant that the measurements
extracted from "unrelated" devices, such as "internal thermometer" and
"external thermometer", did not necessarily have anything in common when
it came to metric names. It was not possible to add a simple tag such as
"rtl_433", for example.

I would like all my weatherstation-related metrics somewhat more closely
named. Originally, I simply added an option for using a fixed metric
name, but then I realized that the MQTT exporter already supports some
string substitution. Let's reuse that thing here.

In many TSDB SW (such as VictoriaMetrics), the actual metric (i.e., the
individual time series) names are built by concatenating this
"measurement name" with each "measurement data name", which means that,
e.g., a soil humidity sensor would still produce nice metrics such as
`rtl433_moisture`.
...because that's what allows easy filtering in the majority of TSDB
software. Thanks to the previous patch, these can be moved to the metric
name if desired.
{
strncpy(ctx->url, url, sizeof(ctx->url));
ctx->url[sizeof(ctx->url) - 1] = '\0';
strncpy(ctx->url, url, RTL433_INFLUX_URL_LENGTH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about getting rid of the zero termination. We kind of know that the influx_client_t should be zero filled. But glancing at a strncpy without accompanying termination it looks like a bug.

|| !strcmp(data->key, "id")
|| !strcmp(data->key, "dst_id")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst_id is not metadata it's part of the Somfy remote commands, there is no src_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that these are not metadata, but to me, the real question is whether this is something that I, as a consumer of these data, should be filtering on, or whether it's some runtime data that I want to plot. In InfluxDB terms (cf. line protocol), these "filterable" items correspond to tags, in Prometheus these are called labels.

I don't use Somfy-IOC myself (it's just some neighbor's roller shutters that I hear clearly in this apartment building), but the data looked pretty static and I was thinking that it makes more sense to produce separate time series based on the (source, destination) pairs. That way users can track communication of the individual device pairs as separate series (if they choose to).

I'll move this to a separate patch, though, and if you think that it's still a wrong idea, I'm OK with dropping it as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just struck me as pretty arbitrary to filter this one random key. It might look similar to the "id" key but is really just a random data field like all the others. So I assumed an understandable mixup. Yes, let's discuss (universal) handling in another PR.

mbuf_reserve(buf, sizeof (metric_name) + 1000);
expand_topic_string(metric_name, influx->metric_format, data, influx->hostname, influx_sanitize_tag);
mbuf_snprintf(buf, "%s", metric_name);
} else if (!data_model) {
Copy link
Collaborator

@zuckschwerdt zuckschwerdt Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this? influx->metric_format is a user setting but data_model is an event property, is there a logic to mixing those within a condition?


typedef char *(expand_string_sanitizer)(char *, char *);

char *expand_topic_string(char *topic, char const *format, data_t *data, char const *hostname, expand_string_sanitizer sanitizer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could run the sanitizer after each expand_topic_string call (it would be more efficient for multiple expansions and less if there was no expansion, but overall neglible). Less nesting/passing of calls and a clear order of expand then sanitize in the calling code. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, good as it is. We need to run the sanitizer on each trailing part so we don't remove the e.g. slashes in MQTT topics.

@zuckschwerdt
Copy link
Collaborator

Great stuff! This was already requested IIRC.

snprintf(ctx->extra_headers, sizeof (ctx->extra_headers), "Authorization: Token %s\r\n", token);
strncpy(ctx->metric_format, metric_format, RTL433_INFLUX_METRIC_LENGTH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang is right, this is a bug ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants