Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] OTTL MD5 converter #33792

Closed
edmocosta opened this issue Jun 27, 2024 · 13 comments
Closed

[pkg/ottl] OTTL MD5 converter #33792

edmocosta opened this issue Jun 27, 2024 · 13 comments
Assignees
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@edmocosta
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Although MD5 is no longer a recommended/safe hashing algorithm, it is still required for compatibility purposes. OTTL currently does not support it.

Describe the solution you'd like

A new OTTL converter/function MD5(attributes), which when invoked would transform the underlying string data into a MD5 hash value.

Describe alternatives you've considered

No response

Additional context

No response

@edmocosta edmocosta added enhancement New feature or request needs triage New item requiring triage labels Jun 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@MovieStoreGuy
Copy link
Contributor

Hey @edmocosta,

it is still required for compatibility purposes

Do you mind explaining what specifically you need to support for compatibility purposes?

My preference is to keep this as unsupported/deprecated and use more secure hashing algorithms.

@edmocosta
Copy link
Contributor Author

Hi @MovieStoreGuy!

I completely agree with you regarding using a more secure/non-colliding hashing algorithm, especially for encryption. But MD5 was widely used for fingerprinting/checksum files for example, and forcing users to change their systems to use a more secure/non-colliding hashing would requires re-hashing and/or performing some kind of migration, which depending on the case, might not be feasible in the short term.

IMO, adding the MD5 function has similar purposes as adding SHA1, and although both are considered unsafe (being MD5 unsafer), they're still in use - unfortunately.

@evan-bradley
Copy link
Contributor

@edmocosta Thanks for the detailed explanation. I think the one difference between MD5 and SHA1 here is that SHA1 hashes are supported by the attributes processor, and the transform processor needs to achieve 1:1 functionality before we will replace it. That said, I think we should at least consider this to boost compatibility with existing systems.

I do have a few questions of my own:

  1. What's the attack vector for hash collisions that could result from using MD5 hashes? Is it risky enough that we should avoid this, or remote enough that we can carefully continue?
  2. If we include this, what's the deprecation timeline?
  3. Would we feel better about this if we put it behind a permanent feature gate that forces the user to acknowledge the risks?

@MovieStoreGuy @TylerHelmuth what do you think?

@evan-bradley evan-bradley added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Jul 1, 2024
@TylerHelmuth
Copy link
Member

It is cumbersome, but there is also an option to fork the transformprocessor and add the desired Converter for your own use.

@flexitrev
Copy link

An MD5 converter was identified here as a missing feature #31930

@edmocosta
Copy link
Contributor Author

Hi @evan-bradley!

Thanks for chime in! Here's my two cents:

I think the one difference between MD5 and SHA1 here is that SHA1 hashes are supported by the attributes processor, and the transform processor needs to achieve 1:1 functionality before we will replace it.

That's exactly what I meant for compatibility purpose. Although SHA1 is also not recommended, it was used while its usages are not replaced by another hashing algorithm.

What's the attack vector for hash collisions that could result from using MD5 hashes? Is it risky enough that we should avoid this, or remote enough that we can carefully continue?

I don't see how it could be used to attack the collector itself. We would only provide a function to allow users to hash a string into a MD5 digest, if they decided to use it for cryptography purposes, that would be a risk, but IMHO, I do believe it should be up to them to weight and assume that risk. Other tools and databases on the market does support the MD5 function, and are not considered insecure just because that.

If we include this, what's the deprecation timeline?

IMO, that's pretty hard to tell, MD5 has long been considered unsafe, but it's still out there and being supported by several tools/databases etc.

Finally, I'm not standing for MD5 or encouraging its usage, but I do think that supporting it might be useful for users that still have to maintain legacy softwares with MD5 hashes.

@edmocosta
Copy link
Contributor Author

Hey folks, any thoughts on this? It seems it was identified as missing feature here.

@kentquirk
Copy link
Member

It's definitely the case that MD5 should not be used for anything secure. But it's available everywhere, from the command line to essentially every programming language, and if what you're doing is looking for a "did this string change" kind of validation, it's a really useful tool. We should support it without stressing about it.

@evan-bradley
Copy link
Contributor

I agree that we should include it. Thanks for digging into this @edmocosta.

@evan-bradley evan-bradley added priority:p2 Medium and removed discussion needed Community discussion needed labels Jul 26, 2024
@edmocosta
Copy link
Contributor Author

Thanks folks for considering this feature! I did submit a PR a time ago, if everyone agrees on that, I think we could re-open it!

@evan-bradley
Copy link
Contributor

Sounds good, I've reopened the PR. Could you make any necessary updates and we can start the review process?

@edmocosta
Copy link
Contributor Author

Sounds good, I've reopened the PR. Could you make any necessary updates and we can start the review process?

It seems the PR code is still valid, I've updated the branch. Thanks!

TylerHelmuth pushed a commit that referenced this issue Aug 9, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Introduced the `MD5` converter function which returns the MD5
hash/digest of the given `value` (path expression to a string telemetry
field or a literal string).

**Link to tracking Issue:**
#33792

**Testing:** <Describe what testing was performed and which tests were
added.>
- Unit tests
- E2E tests

**Documentation:** 
A new README entry was added for the `MD5` function.

---------

Co-authored-by: Evan Bradley <[email protected]>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Introduced the `MD5` converter function which returns the MD5
hash/digest of the given `value` (path expression to a string telemetry
field or a literal string).

**Link to tracking Issue:**
open-telemetry#33792

**Testing:** <Describe what testing was performed and which tests were
added.>
- Unit tests
- E2E tests

**Documentation:** 
A new README entry was added for the `MD5` function.

---------

Co-authored-by: Evan Bradley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
Development

No branches or pull requests

6 participants