-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve discrepancy between text subfield handling for *.name
fields in ecs@mappings
#2353
Comments
I am not too involved with ECS after its sync with otel, so this is more a personal preference than anything else. I feel that option 1 is most feasible, that elastic package validation should be modified to allow multi-fields to be mapped to "at least" one of the multiple types defined in a multi field. I do not know the impact on the work with syncing up ecs with otel, but even though option 1 is most feasible, I would have liked to see all .name to be changed to only keyword mapping types, though would be good to see if we take advantage of multi-fields on .name anywhere in our stack (UI, Siem/monitoring rule, prebuilt ml jobs etc). |
I don't agree that the current state of So IMHO, if there's any reason to consider option 2, it would be the storage savings. Since I don't know enough to say anything about the actual storage overhead in practice or the entire consequences of changing ECS, I don't have a good advice on which of the options is the best. |
I think you can argue both ways. The issue is that there's no clear definition of whether or not having additional fields is ECS compliant, which leads to some assuming it is compliant and some it's not (like the elastic-package validation). One valid outcome of this is that we're declaring having additional subfields is ECS compliant, which is option 1. I don't have a strong opinion but I'm leaning towards either option 2 or 3, for this reason that I mentioned in the description:
Adding a text sub-field requires more storage but enables richer query capabilities. Whether or not that tradeoff is worth it should be decided in the process of adding a field to ECS. We may decide that all |
This probably deserves a separate discussion. Bin in short, I think we should have a process that adds all (maybe only stable?) Semantic Convention fields to ECS, at which time we'll decide on the Elasticsearch field type. Ideally, this should be guided by naming conventions (like ECS would then become a superset of Semantic Conventions plus a mapping to Elasticsearch field types. |
Thinking about this again, I think an integration should still be considered ECS compliant if it adds additional sub-fields that aren't part of ECS but are relevant to that specific integration (like additional Having said that, I think the point still stands that whether or not a field should have a |
I think it is not so much a question about ECS compliance, but about what to expect when using
Regarding the options, if we think that it is a good idea that all In the meantime, for packages and |
That is the point I keep thinking about. There are 3 different approaches: Index as little as possible, find the perfect balance or index everything. Historically we opted for index everything as that is how the system is fast. Find the perfect balance is not possible, because the same data is used for different use case (o11y, security) and even inside use cases, there are different scenarios. Index as little as possible has the downside, that some "default" queries might not be fast. What if we would have There are fields that potentially always have |
I don't think we need to entirely disable the validation on multi-fields. But I think it makes sense to be lenient when there are additional multi-fields that were not expected.
In this case, it's not really about whether or not to index or whether queries are fast or slow. It's what capabilities are supported in search (exact matches with keyword vs match queries with match_only_text). But I think I see where you're going with that in general. Some kind of solution or use-case specific extension to ECS does make sense to me. Ideally, that would also be based around consistent naming conventions. Not sure if it's a choice the user needs to make, though. For certain workflows or curated experiences, we may have expectations around how a field gets indexed. In that case, we should enable the additional mappings by default. |
Proposal on a way forward:
Not sure whether we should change something in the behavior of the |
Proposal SGTM.
I think we need to change the behavior at least to support versions of
Yeah, it is more or less like that. But it doesn't remove any field, it checks that any field in a document has a definition in the package itself or in ECS. It also checks that the values of the fields match with the defined type.
Yes, elastic-package uses Historically (since the times of beats) we have been relying on |
I just realized that such a thread already exists: #2118. However, there's no conclusion as of yet. |
The
ecs@mappings
component template that ships with Elasticsearch by default has a dynamic template with path match on*.name
that adds a.text
subfield. However, in actual ECS, not all*.name
fields have a.text
subfield.The reasons why we still added the
*.name
rule in elastic/elasticsearch#96171 included making the component template smaller, more consistent, more generic, and more forward compatible, in the sense that we don't need to constantly add new field definitions for new*.name
fields.However, in effect, the
ecs@mappings
component template isn't technically ECS compliant, which leads to issues likeelastic-package
reporting errors when integrations rely onecs@mappings
: elastic/elastic-package#1971. It also seems like whether or not a field should have a text subfield is a decision we should make at the ECS level rather than being a side-effect of howecs@mappings
is implemented.In total, there are 150 ECS fields that end with
.name
. Out of these, 41 have a.text
sub-field and 109 don't.There are multiple options to move forward from here:
ecs@mappings
has a less efficient mapping compared to "proper ECS".ecs@mappings
with the current definition of ECS. We'll probably implement this by listing fields that have a.text
subfield as there are fewer of them. In other words,*.name
fields won't have a.text
subfield by default. We'll need to expect that changes toecs@mappings
are going to be a bit more frequent and that the mapping is less forwards compatible.ecs@mappings
to make sure all*.name
fields are mapped consistently. It'll be a bit easier for users to reason about what type of queries they can expect to work on*.name
fields. It would also bring us closer to a place where ECS is built around naming conventions rather than one-off per field decisions. However, this would add a bit more storage overhead compared to what we have today.cc'ing a couple of folks that may have thoughts on this: @ruflin @eyalkoren @jsoriano @zmoog @andrewkroh @P1llus
The text was updated successfully, but these errors were encountered: