You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We run a SaaS product collecting log data from thousands of customers and applications. Their logs are in various formats and styles.
We have an index template that defines some well known fields of specific types that allow us and our customers some additional functionality when using our product. Such as sorting, range queries, etc. We do our best to normalize data however, we cannot account for all cases. When we run into a situation where a mapped field receives a document with a field of a different type, or a value that cannot be coerced, the entire document is reject. In this case we are force to remove as much extraneous data as possible in an effort to index what we feel is absolutely critical. For logs, this is mainly the level and message. But this isn't ideal as we are throwing away customer data, and in many cases entire subset of log lines.
A prime example is timestamps. Great for sorting, and range queries, but can be seen in an imposible number of formats. we have been reactively trying to add date formats to accommodate an increasing number of customers using non standard date formats which typically results in us having to drop their data.
Currently one of our datetime fields has grown as such:
This is still not enough to accommodate what our customers are sending us, and it is a never ending problem.
Removing it from the index mapping means we lose certain sets of functionality. Keeping it means we're most certainly losing data.
Describe the solution you'd like
Elasticsearch has a setting ignore_malformed on the root template, and on the field level that will ignore such conflicts field by field and index what is possible rather than rejecting the entire document.
Describe alternatives you've considered
removing field mappings for problemmatic fields
adding more date formats to capture as many date formats as we see come across
Stripping ingested documents to the bare essential fields when their is a field conflict -- even in this case the stripped document can have conflicts in which case we have to discard the entire document.
Is your feature request related to a problem? Please describe.
We run a SaaS product collecting log data from thousands of customers and applications. Their logs are in various formats and styles.
We have an index template that defines some well known fields of specific types that allow us and our customers some additional functionality when using our product. Such as sorting, range queries, etc. We do our best to normalize data however, we cannot account for all cases. When we run into a situation where a mapped field receives a document with a field of a different type, or a value that cannot be coerced, the entire document is reject. In this case we are force to remove as much extraneous data as possible in an effort to index what we feel is absolutely critical. For logs, this is mainly the level and message. But this isn't ideal as we are throwing away customer data, and in many cases entire subset of log lines.
A prime example is timestamps. Great for sorting, and range queries, but can be seen in an imposible number of formats. we have been reactively trying to add date formats to accommodate an increasing number of customers using non standard date formats which typically results in us having to drop their data.
Currently one of our datetime fields has grown as such:
This is still not enough to accommodate what our customers are sending us, and it is a never ending problem.
Removing it from the index mapping means we lose certain sets of functionality. Keeping it means we're most certainly losing data.
Describe the solution you'd like
Elasticsearch has a setting ignore_malformed on the root template, and on the field level that will ignore such conflicts field by field and index what is possible rather than rejecting the entire document.
Describe alternatives you've considered
Additional context
https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html#_dealing_with_malformed_fields
The text was updated successfully, but these errors were encountered: