-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update(userspace/falco): add libsinsp state metrics option #2883
Changes from all commits
50bda4f
4d20049
70a14e3
ada5f21
6b05bcf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,7 @@ limitations under the License. | |
|
||
// The version of this Falco engine | ||
#define FALCO_ENGINE_VERSION_MAJOR 0 | ||
#define FALCO_ENGINE_VERSION_MINOR 27 | ||
#define FALCO_ENGINE_VERSION_MINOR 28 | ||
#define FALCO_ENGINE_VERSION_PATCH 0 | ||
|
||
#define FALCO_ENGINE_VERSION \ | ||
|
@@ -34,4 +34,4 @@ limitations under the License. | |
// It represents the fields supported by this version of Falco, | ||
// the event types, and the underlying driverevent schema. It's used to | ||
// detetect changes in engine version in our CI jobs. | ||
#define FALCO_ENGINE_CHECKSUM "dbc34e88ab420320994d85f155dee6baff2dd018aacc00e249f897edc8b1e0f4" | ||
#define FALCO_ENGINE_CHECKSUM "5d488b68856d70300ae37453295383821822d8423af170eb28e1bef52042f0b3" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jasondellaluce the instructions above to calculate the checksum seem outdated. I tried it locally before pushing and it was wrong, had to re-push the correct value based on CI error logs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got the right number by running the command in this branch 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh I will double check later, thanks for checking! Maybe I messed up somewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I seem to have some stale Falco version in my local branch somehow.
Not sure what's up here, but it's also not that important as I don't really have any issues on my fedora box ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,6 +54,11 @@ falco::app::run_result falco::app::actions::open_live_inspector( | |
{ | ||
try | ||
{ | ||
if((s.config->m_metrics_flags & PPM_SCAP_STATS_STATE_COUNTERS)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we enable these stats with a source plugin should we expect all empty values? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think metrics and plugins is currently a bit broken because of some recent libs refactors (linux platform which does not apply for plugins hence we don't fetch some constants that are needed to for example calculate the CPU utilization). Since the kernel drivers are Falco's primary use case, we can work on the plugin cases later. End goal: When running metrics with plugins you should still be able to fetch all libsinsp metrics that are applicable (depending on the nature of the plugin). |
||
{ | ||
inspector->set_sinsp_stats_v2_enabled(); | ||
} | ||
|
||
if (source != falco_common::syscall_source) /* Plugin engine */ | ||
{ | ||
for (const auto& p: inspector->get_plugin_manager()->plugins()) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,9 +72,7 @@ falco_configuration::falco_configuration(): | |
m_metrics_interval(5000), | ||
m_metrics_stats_rule_enabled(false), | ||
m_metrics_output_file(""), | ||
m_metrics_resource_utilization_enabled(true), | ||
m_metrics_kernel_event_counters_enabled(true), | ||
m_metrics_libbpf_stats_enabled(true), | ||
m_metrics_flags((PPM_SCAP_STATS_KERNEL_COUNTERS | PPM_SCAP_STATS_LIBBPF_STATS | PPM_SCAP_STATS_RESOURCE_UTILIZATION | PPM_SCAP_STATS_STATE_COUNTERS)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this change seems to go in the same direction proposed above, one unique config 'to rule them all' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As commented above we can discuss. |
||
m_metrics_convert_memory_to_mb(true), | ||
m_metrics_include_empty_values(false) | ||
{ | ||
|
@@ -380,9 +378,29 @@ void falco_configuration::load_yaml(const std::string& config_name, const yaml_h | |
m_metrics_interval = falco::utils::parse_prometheus_interval(m_metrics_interval_str); | ||
m_metrics_stats_rule_enabled = config.get_scalar<bool>("metrics.output_rule", false); | ||
m_metrics_output_file = config.get_scalar<std::string>("metrics.output_file", ""); | ||
m_metrics_resource_utilization_enabled = config.get_scalar<bool>("metrics.resource_utilization_enabled", true); | ||
m_metrics_kernel_event_counters_enabled = config.get_scalar<bool>("metrics.kernel_event_counters_enabled", true); | ||
m_metrics_libbpf_stats_enabled = config.get_scalar<bool>("metrics.libbpf_stats_enabled", true); | ||
|
||
m_metrics_flags = 0; | ||
if (config.get_scalar<bool>("metrics.resource_utilization_enabled", true)) | ||
{ | ||
m_metrics_flags |= PPM_SCAP_STATS_RESOURCE_UTILIZATION; | ||
|
||
} | ||
if (config.get_scalar<bool>("metrics.state_counters_enabled", true)) | ||
{ | ||
m_metrics_flags |= PPM_SCAP_STATS_STATE_COUNTERS; | ||
|
||
} | ||
if (config.get_scalar<bool>("metrics.kernel_event_counters_enabled", true)) | ||
{ | ||
m_metrics_flags |= PPM_SCAP_STATS_KERNEL_COUNTERS; | ||
|
||
} | ||
if (config.get_scalar<bool>("metrics.libbpf_stats_enabled", true)) | ||
{ | ||
m_metrics_flags |= PPM_SCAP_STATS_LIBBPF_STATS; | ||
|
||
} | ||
|
||
m_metrics_convert_memory_to_mb = config.get_scalar<bool>("metrics.convert_memory_to_mb", true); | ||
m_metrics_include_empty_values = config.get_scalar<bool>("metrics.include_empty_values", false); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -343,7 +343,7 @@ void stats_writer::collector::get_metrics_output_fields_wrapper( | |
if (m_last_num_evts != 0 && stats_snapshot_time_delta_sec > 0) | ||
{ | ||
/* Successfully processed userspace event rate. */ | ||
output_fields["falco.evts_rate_sec"] = (double)((num_evts - m_last_num_evts) / (double)stats_snapshot_time_delta_sec); | ||
output_fields["falco.evts_rate_sec"] = std::round((double)((num_evts - m_last_num_evts) / (double)stats_snapshot_time_delta_sec) * 10.0) / 10.0; // round to 1 decimal | ||
} | ||
output_fields["falco.num_evts"] = num_evts; | ||
output_fields["falco.num_evts_prev"] = m_last_num_evts; | ||
|
@@ -358,60 +358,74 @@ void stats_writer::collector::get_metrics_output_fields_additional( | |
const scap_agent_info* agent_info = inspector->get_agent_info(); | ||
|
||
#if !defined(MINIMAL_BUILD) and !defined(__EMSCRIPTEN__) | ||
/* Resource utilization, CPU and memory usage etc. */ | ||
uint32_t nstats = 0; | ||
int32_t rc = 0; | ||
if (m_writer->m_config->m_metrics_resource_utilization_enabled) | ||
uint32_t flags = m_writer->m_config->m_metrics_flags; | ||
|
||
auto buffer = inspector->get_sinsp_stats_v2_buffer(); | ||
auto sinsp_stats_v2 = inspector->get_sinsp_stats_v2(); | ||
sinsp_thread_manager* thread_manager = inspector->m_thread_manager; | ||
const scap_stats_v2* sinsp_stats_v2_snapshot = libsinsp::stats::get_sinsp_stats_v2(flags, agent_info, thread_manager, sinsp_stats_v2, buffer, &nstats, &rc); | ||
|
||
if (sinsp_stats_v2_snapshot && rc == 0 && nstats > 0) | ||
{ | ||
const scap_stats_v2* utilization; | ||
auto buffer = inspector->get_sinsp_stats_v2_buffer(); | ||
utilization = libsinsp::resource_utilization::get_resource_utilization(agent_info, buffer, &nstats, &rc); | ||
if (utilization && rc == 0 && nstats > 0) | ||
for(uint32_t stat = 0; stat < nstats; stat++) | ||
{ | ||
for(uint32_t stat = 0; stat < nstats; stat++) | ||
if (sinsp_stats_v2_snapshot[stat].name[0] == '\0') | ||
{ | ||
break; | ||
} | ||
char metric_name[STATS_NAME_MAX] = "falco."; | ||
strlcat(metric_name, sinsp_stats_v2_snapshot[stat].name, sizeof(metric_name)); | ||
switch(sinsp_stats_v2_snapshot[stat].type) | ||
{ | ||
char metric_name[STATS_NAME_MAX] = "falco."; | ||
strlcat(metric_name, utilization[stat].name, sizeof(metric_name)); | ||
switch(utilization[stat].type) | ||
case STATS_VALUE_TYPE_U64: | ||
if (sinsp_stats_v2_snapshot[stat].value.u64 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
case STATS_VALUE_TYPE_U64: | ||
if (utilization[stat].value.u64 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
break; | ||
} | ||
if (m_writer->m_config->m_metrics_convert_memory_to_mb && strncmp(utilization[stat].name, "container_memory_used", 22) == 0) // exact str match | ||
{ | ||
output_fields[metric_name] = (uint64_t)(utilization[stat].value.u64 / (double)1024 / (double)1024); | ||
} | ||
else | ||
{ | ||
output_fields[metric_name] = utilization[stat].value.u64; | ||
} | ||
break; | ||
case STATS_VALUE_TYPE_U32: | ||
if (utilization[stat].value.u32 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
break; | ||
} | ||
if (m_writer->m_config->m_metrics_convert_memory_to_mb && strncmp(utilization[stat].name, "memory_", 7) == 0) // prefix match | ||
} | ||
if (m_writer->m_config->m_metrics_convert_memory_to_mb) | ||
{ | ||
if (strncmp(sinsp_stats_v2_snapshot[stat].name, "container_memory_used", 22) == 0) // exact str match | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe in the future, we could add a flag inside the sinsp struct saying that the metric is relative to the "memory" In this way we could avoid the specific There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes had something like that in mind for the libs refactor I will start in Dec. For example, in libs there is still a todo comment to add a unit item to the new metrics schema struct. All these if conditions are not what we want as falco should just be a simple loop over the metrics. In addition, will look into the base units concept you shared. |
||
{ | ||
output_fields[metric_name] = (uint32_t)(utilization[stat].value.u32 / (double)1024); | ||
} | ||
else | ||
output_fields[metric_name] = (uint64_t)(sinsp_stats_v2_snapshot[stat].value.u64 / (double)1024 / (double)1024); | ||
|
||
} else if (strncmp(sinsp_stats_v2_snapshot[stat].name, "memory_", 7) == 0) // prefix match | ||
{ | ||
output_fields[metric_name] = utilization[stat].value.u32; | ||
} | ||
break; | ||
case STATS_VALUE_TYPE_D: | ||
if (utilization[stat].value.d == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
output_fields[metric_name] = (uint64_t)(sinsp_stats_v2_snapshot[stat].value.u64 / (double)1024); | ||
} else | ||
{ | ||
break; | ||
output_fields[metric_name] = sinsp_stats_v2_snapshot[stat].value.u64; | ||
} | ||
output_fields[metric_name] = utilization[stat].value.d; | ||
} | ||
else | ||
{ | ||
output_fields[metric_name] = sinsp_stats_v2_snapshot[stat].value.u64; | ||
} | ||
break; | ||
case STATS_VALUE_TYPE_U32: | ||
if (sinsp_stats_v2_snapshot[stat].value.u32 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
break; | ||
default: | ||
} | ||
if (m_writer->m_config->m_metrics_convert_memory_to_mb && strncmp(sinsp_stats_v2_snapshot[stat].name, "memory_", 7) == 0) // prefix match | ||
{ | ||
output_fields[metric_name] = (uint32_t)(sinsp_stats_v2_snapshot[stat].value.u32 / (double)1024); | ||
} | ||
else | ||
{ | ||
output_fields[metric_name] = sinsp_stats_v2_snapshot[stat].value.u32; | ||
} | ||
break; | ||
case STATS_VALUE_TYPE_D: | ||
if (sinsp_stats_v2_snapshot[stat].value.d == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
break; | ||
} | ||
output_fields[metric_name] = sinsp_stats_v2_snapshot[stat].value.d; | ||
break; | ||
default: | ||
break; | ||
} | ||
} | ||
} | ||
|
@@ -424,18 +438,16 @@ void stats_writer::collector::get_metrics_output_fields_additional( | |
/* Kernel side stats counters and libbpf stats if applicable. */ | ||
nstats = 0; | ||
rc = 0; | ||
uint32_t flags = 0; | ||
|
||
if (m_writer->m_config->m_metrics_kernel_event_counters_enabled) | ||
{ | ||
flags |= PPM_SCAP_STATS_KERNEL_COUNTERS; | ||
} | ||
if (m_writer->m_config->m_metrics_libbpf_stats_enabled && (inspector->check_current_engine(BPF_ENGINE) || inspector->check_current_engine(MODERN_BPF_ENGINE))) | ||
if (!(inspector->check_current_engine(BPF_ENGINE) || inspector->check_current_engine(MODERN_BPF_ENGINE))) | ||
{ | ||
flags |= PPM_SCAP_STATS_LIBBPF_STATS; | ||
flags &= ~PPM_SCAP_STATS_LIBBPF_STATS; | ||
} | ||
const scap_stats_v2* stats_v2 = inspector->get_capture_stats_v2(flags, &nstats, &rc); | ||
if (stats_v2 && nstats > 0 && rc == 0) | ||
|
||
// Note: ENGINE_FLAG_BPF_STATS_ENABLED check has been moved to libs, that is, when libbpf stats is not enabled | ||
// in the kernel settings we won't collect them even if the end user enabled the libbpf stats option | ||
|
||
const scap_stats_v2* scap_stats_v2_snapshot = inspector->get_capture_stats_v2(flags, &nstats, &rc); | ||
if (scap_stats_v2_snapshot && nstats > 0 && rc == 0) | ||
{ | ||
/* Cache n_evts and n_drops to derive n_drops_perc. */ | ||
uint64_t n_evts = 0; | ||
|
@@ -444,24 +456,28 @@ void stats_writer::collector::get_metrics_output_fields_additional( | |
uint64_t n_drops_delta = 0; | ||
for(uint32_t stat = 0; stat < nstats; stat++) | ||
{ | ||
if (scap_stats_v2_snapshot[stat].name[0] == '\0') | ||
{ | ||
break; | ||
} | ||
// todo: as we expand scap_stats_v2 prefix may be pushed to scap or we may need to expand | ||
// functionality here for example if we add userspace syscall counters that should be prefixed w/ `falco.` | ||
char metric_name[STATS_NAME_MAX] = "scap."; | ||
strlcat(metric_name, stats_v2[stat].name, sizeof(metric_name)); | ||
switch(stats_v2[stat].type) | ||
strlcat(metric_name, scap_stats_v2_snapshot[stat].name, sizeof(metric_name)); | ||
switch(scap_stats_v2_snapshot[stat].type) | ||
{ | ||
case STATS_VALUE_TYPE_U64: | ||
/* Always send high level n_evts related fields, even if zero. */ | ||
if (strncmp(stats_v2[stat].name, "n_evts", 7) == 0) // exact not prefix match here | ||
if (strncmp(scap_stats_v2_snapshot[stat].name, "n_evts", 7) == 0) // exact not prefix match here | ||
{ | ||
n_evts = stats_v2[stat].value.u64; | ||
n_evts = scap_stats_v2_snapshot[stat].value.u64; | ||
output_fields[metric_name] = n_evts; | ||
output_fields["scap.n_evts_prev"] = m_last_n_evts; | ||
n_evts_delta = n_evts - m_last_n_evts; | ||
if (n_evts_delta != 0 && stats_snapshot_time_delta_sec > 0) | ||
{ | ||
/* n_evts is total number of kernel side events. */ | ||
output_fields["scap.evts_rate_sec"] = (double)(n_evts_delta / stats_snapshot_time_delta_sec); | ||
output_fields["scap.evts_rate_sec"] = std::round((double)(n_evts_delta / stats_snapshot_time_delta_sec) * 10.0) / 10.0; // round to 1 decimal | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe at some point we could compute these metrics directly in sisnp and expose just a list to Falco in this way we could avoid specific There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes totally agreed to push almost everything (or what we can) down into libs. |
||
} | ||
else | ||
{ | ||
|
@@ -470,28 +486,28 @@ void stats_writer::collector::get_metrics_output_fields_additional( | |
m_last_n_evts = n_evts; | ||
} | ||
/* Always send high level n_drops related fields, even if zero. */ | ||
else if (strncmp(stats_v2[stat].name, "n_drops", 8) == 0) // exact not prefix match here | ||
else if (strncmp(scap_stats_v2_snapshot[stat].name, "n_drops", 8) == 0) // exact not prefix match here | ||
{ | ||
n_drops = stats_v2[stat].value.u64; | ||
n_drops = scap_stats_v2_snapshot[stat].value.u64; | ||
output_fields[metric_name] = n_drops; | ||
output_fields["scap.n_drops_prev"] = m_last_n_drops; | ||
n_drops_delta = n_drops - m_last_n_drops; | ||
if (n_drops_delta != 0 && stats_snapshot_time_delta_sec > 0) | ||
{ | ||
/* n_drops is total number of kernel side event drops. */ | ||
output_fields["scap.evts_drop_rate_sec"] = (double)(n_drops_delta / stats_snapshot_time_delta_sec); | ||
output_fields["scap.evts_drop_rate_sec"] = std::round((double)(n_drops_delta / stats_snapshot_time_delta_sec) * 10.0) / 10.0; // round to 1 decimal | ||
} | ||
else | ||
{ | ||
output_fields["scap.evts_drop_rate_sec"] = (double)(0); | ||
} | ||
m_last_n_drops = n_drops; | ||
} | ||
if (stats_v2[stat].value.u64 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
if (scap_stats_v2_snapshot[stat].value.u64 == 0 && !m_writer->m_config->m_metrics_include_empty_values) | ||
{ | ||
break; | ||
} | ||
output_fields[metric_name] = stats_v2[stat].value.u64; | ||
output_fields[metric_name] = scap_stats_v2_snapshot[stat].value.u64; | ||
break; | ||
default: | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about having a list of metrics instead of many boolean values? maybe something like
if we like the idea we could deprecate the old flags and introduce this new config. If we go in this way probably I would merge
libbpf_stats_enabled
intokernel_event_counters_enabled
, in the end, to have thelibbpf_stats_enabled
you need to enable the statssudo sysctl -w kernel.bpf_stats_enabled=1
, so we already have a knob. Moreover, it is something only related tobpf
so maybe it is better to be more agnostic and hide it insidekernel_countes
WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss a
enabled_metrics
list. Given we are deprecating more urgent old settings I am not be sure if this is worth it in the near-term. Since metrics is now "Stable" we would need to follow a deprecation cycle. I would propose to open a discussion issue, tag all libs and falco maintainers and if the majority votes in favor ofenabled_metrics: ["kernel_countes", ... ]
start the proper deprecation cycle. WDYT?Kernel settings can be shared amongst teams deploying their tools. Therefore, it can be fair to assume that maybe another team enabled libbpf stats, but you still have a use case of not wanting to collect libbpf stats for Falco. I would prefer giving the end user full control and be very selective with the metrics especially given it's quite a lot of metrics and perhaps you only need some of them sometimes. With that I would also not merge libbf stats and the kernel event counters. All these reasons were the motivation to have these flags that are all teh way pushed down to libs in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can discuss it without blocking the PR, i will tag all here so even if we merge this we can discuss with the clear scope of this PR @falcosecurity/falco-maintainers
Re
libbpf_stats_enabled
: my idea was to use the sysctl optionkernel.bpf_stats_enabled
as a knob, so if a team wants to enable libbpf stats together with the kernel stats it has just to type asudo sysctl -w kernel.bpf_stats_enabled=1
on the node, while if they want to disable them they could usesudo sysctl -w kernel.bpf_stats_enabled=0
. The libsinsp code at every interval could checkkernel.bpf_stats_enabled
(or maybe just at init time, it depends on what we want) and add the libbpf metrics if needed. BTW it was just an idea, no strong opinion hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about using
sysctl
valuekernel.bpf_stats_enabled
to chose whether to enable/disable our kernel stats without further configuration by the user, it's a great idea actually! We could use its value as a default value for the configuration key and ship it commented? (same forkernel_event_counters_enabled
eventually)I don't have a strong opinion on this one; i think from an UX perspective it is better to already have all the keys and just set "true"/"false" whether i need them, than to add strings inside a set.