Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cleaner] Improve IPV6 MAC address RE #3766

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pmoravec
Copy link
Contributor

@pmoravec pmoravec commented Aug 29, 2024

Current RE obfuscates also UUID strings, let be more strict.

Resolves: #3766
Relevant: #3736


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

pmoravec added a commit to pmoravec/sos that referenced this pull request Aug 29, 2024
Current RE obfuscates also UUID strings, let be more strict.

Resolves: sosreport#3766
Relevant: sosreport#3736

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-cleaner-IPV6_REG_4HEX branch from b0f5530 to 3d76300 Compare August 29, 2024 12:23
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3766
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

pmoravec added a commit to pmoravec/sos that referenced this pull request Aug 29, 2024
Current RE obfuscates also UUID strings, let be more strict.

Resolves: sosreport#3766
Relevant: sosreport#3736

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-cleaner-IPV6_REG_4HEX branch from 3d76300 to 3b6a1fe Compare August 29, 2024 12:25
@TurboTurtle
Copy link
Member

In testing this, I've found a false positive match. It is a little artificial, but I think worth discussing.

We picked up an IPv6 substring as a mac address match, in /etc/java/java-17-openjdk/java-17-openjdk-17.0.12.0.7-2.fc39.x86_64/conf/sdp/sdp.conf.template (note, we have a separate issue where the openjdk filepath is getting obfuscated as an IP address).

That's a rather weird filepath, but it gets picked up by the java plugin (and has for a long time), and is dropped by version specific openjdk packages as an example config file. In this example file are comment lines about bind addresses, e.g. (this is the original):

# Use SDP for all sockets that bind to specific local addresses
#bind    192.168.6.1    *
#bind    fe80::40da:1e0e:051f:fe31    *

The second bind address is getting obfuscated as a mac address, as it gets converted to:

# Use SDP for all sockets that bind to specific local addresses
#bind    101.0.0.3    *
#bind    fe80::534f:53ff:fead:a1f6    *

This is likely influenced by the fact that we skip link-local addresses (starting with fe80:) for obfuscation in the ipv6 mapper, but it is entirely probable that if we were obfuscating this address with the ipv6 mapper, we'd then erroneously re-obfuscate the substring again.

This specific example is in a comment of an example template, but we could very well see real world collisions like this.

@pmoravec
Copy link
Contributor Author

pmoravec commented Sep 5, 2024

You mean cleaner should skip obfuscating the fe80::40da:1e0e:051f:fe31 because it is a link-local address (starting with fe80:)? That makes sense, anyway the current code does obfuscate it as well, my PR is not a regression here.

I can fix this within my PR, but I feel it rather deserves a new issue/PR. As I think the fix should change the code elsewhere than in the RE declaration (is there a not-over-complex RE stating "I am IPV6 MAC address but not link-local one"? I would rather add a test "RE matched, but isn't it link-local?" later on).

Is my idea "fix elsewhere than in RE declaration" sane? Worth extending the PR or raising a new issue (that I can work on, I dont mean my post to kick the concern away from me)? My 2c is to raise a new issue I will prepare a PR for.

@TurboTurtle
Copy link
Member

This isn't a direct case of skipping fe80 addresses. Also, let me clarify. We skip precisely the fe80 part that makes it a link-local address, not the entire address. I had to check that again, but that is the design and my notes reflect that as the intent - keep the signal (fe80: ) that this address is link-local to not throw off investigations where that is likely pertinent information.

With current main, the part of the address after fe80 is indeed obfuscated by the IPv6 IP mapper, which is correct.

With this PR, the part after fe80 is getting obfuscated by the IPv6 MAC mapper, which is wrong. Specifically, the address is actually getting obfuscated first by the IP mapper, and then again by the MAC mapper, which throws off investigations into this (it certainly did for me).

If you clear your default_mapping file and run this locally, then you'll see an entry for the IPv6 IP mapper in the private_map that gets generated for your eth0 (or equivalent) ipv6 address, and then that address (after fe80) getting obfuscated by the mac mapper.

pmoravec added a commit to pmoravec/sos that referenced this pull request Sep 24, 2024
Current RE obfuscates also UUID strings, let be more strict.

Resolves: sosreport#3766
Relevant: sosreport#3736

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-cleaner-IPV6_REG_4HEX branch from 3b6a1fe to a9b0161 Compare September 24, 2024 08:13
@pmoravec
Copy link
Contributor Author

This was dirty and I hit a limitation of re library that requires fixed width of negative lookbehind string. The limitation is responsible for several grey hairs of mine, plus on not-precise string at the end - there is still a case the cleaner will match when it shouldn't. But at least no regression here..

Try my script:

#!/usr/bin/python3
import re, sys

IPV6_REG_4HEX_OLD = (
    r'((?<!([0-9a-fA-F\'\"]:)|::)(([^:\-]?[0-9a-fA-F]{4}(:|-)){3}'
    r'[0-9a-fA-F]{4}(\'|\")?(\/|\,|\-|\.|\s|$)))'
)

IPV6_REG_4HEX_NEW = (
        r'((?<!(?:([.|^|\b]{5}\w|[.|^|\b]fe80:|fe80::)))'
        r'(([0-9a-fA-F]{4}:){3}[0-9a-fA-F]{4})(?!\w))|'
        r'((?<!\w)(([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{4})(?!\w))'
)

IPV6_REG_4HEX = IPV6_REG_4HEX_OLD if (len(sys.argv)>1 and sys.argv[1]=="old") else IPV6_REG_4HEX_NEW

test_strings = (
        (False, '/pulp/api/v3/content/rpm/packages/0190fe17-d1ae-7091-a034-22ade2121dc0/'),
        (False, '0190fe17-d1ae-7091-a034-22ade2121dc0'),
        (True, 'aabb:ccee:ddee:ffaa'),
        (True, 'line=aabb:ccee:ddee:ffaa'),
        (True, 'line=aabb:ccee:ddee:ffaa, something else'),
        (True, 'address:aabb:ccee:ddee:ffaa'),
        (True, 'addr=[aabb:ccee:ddee:ffaa]'),
        (True, 'addr="aabb:ccee:ddee:ffaa"'),
        (True, "addr='aabb:ccee:ddee:ffaa'"),
        (True, 'aabb:ccee:ddee:ffaa, but not aa:bb:cc:dd'),
        (True, 'line=aabb:ccee:ddee:ffaa-should still match'),
        (False, 'not aabb:ccee:ddee:ffaaaaaa'),
        (True, 'logged_mac:aabb:ccee:ddee:ffaa=some_other_data'),
        (True, 'logged_mac:aabb:ccee:ddee:ffaa some_other_data'),
        (True, '::aabb:ccee:ddee:ffaa'),
        (True, ':aabb:ccee:ddee:ffaa'),
        (True, 'we should obfuscate addr: aabb:ccee:ddee:ffaa: YES'),
        (False, 'fe80::534f:53ff:fead:a1f6 link local, dont touch'),
        (False, '#bind    fe80::534f:53ff:fead:a1f6    *'),
        (False, 'aaabb:ccee:ddee:ffaa'),
)

intro=f"Matching against {IPV6_REG_4HEX}"
print(intro)
print("-"*len(intro))
print()

for match, line in test_strings:
    search = re.search(IPV6_REG_4HEX, line)
    print(f"{line}\t:\t\t{search}")
    if (None!=search) != match:
        print(f"\t\tERROR {line} should {'' if match else 'not '}match, but re.search={search}")

That fails with current IPV6_REG_4HEX on 8 tests. The proposed string fails only on one where the current one is failing as well:

aaabb:ccee:ddee:ffaa	:		<re.Match object; span=(1, 20), match='aabb:ccee:ddee:ffaa'>
		ERROR aaabb:ccee:ddee:ffaa should not match, but re.search=<re.Match object; span=(1, 20), match='aabb:ccee:ddee:ffaa'>

The match works differently for addresses with : or with - as a delimiter. That is by purpose. My aim is to stop matching UUIDs (having -) while respecting the "fe80: link-local" requirement would hit the re.error: look-behind requires fixed-width pattern error. The "fixed length" in negative lookbehind does not work well (see the counterexample, "negative of multiple line starts" does not work), but that is the compromise to skip the link-local addresses.

pmoravec added a commit to pmoravec/sos that referenced this pull request Sep 24, 2024
Current RE obfuscates also UUID strings, let be more strict.

Resolves: sosreport#3766
Relevant: sosreport#3736

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-cleaner-IPV6_REG_4HEX branch from a9b0161 to cc91383 Compare September 24, 2024 08:22
Current RE obfuscates also UUID strings, let be more strict.

Resolves: sosreport#3766
Relevant: sosreport#3736

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-cleaner-IPV6_REG_4HEX branch from cc91383 to 94c81c6 Compare September 24, 2024 08:43
@pmoravec
Copy link
Contributor Author

Why pylint repeatedly fails on code I havent touched..? :-o

@jcastill
Copy link
Member

Why pylint repeatedly fails on code I havent touched..? :-o

It seems that this was added to pylint via pylint-dev/pylint#9842 not too long ago. I'll open an issue or a PR to discuss what to do with it.

@pmoravec
Copy link
Contributor Author

Hi @arif-ali or @TurboTurtle , anything blocking here? This slows down cleaner really a lot, when it processes a file with many UUIDs wrongly treated as IPv6 MAC addresses :(

The pylint failing test should happen only on my branch - should I rebase it to have @jcastill 's patch of it?

@TurboTurtle
Copy link
Member

This one slipped by me, I'll retest this week and update my review.

@arif-ali
Copy link
Member

arif-ali commented Nov 19, 2024

I'll have a look, apologies, missed it too.

We ought to use the GH labels a bit more, so that we can see at a glance what still needs reviewing, and which is is waiting on the owner of the PR.

I do try to go through every PR every now and again to look though stuff, which is not ideal and time consuming.

no need for rebase for pylint specifically

@arif-ali arif-ali added the Needs Review Review required label Nov 20, 2024
@arif-ali
Copy link
Member

I'm not too familiar with IPV6 MAC, however, used the regex101, and see that everything you're testing is great. One thing to note, and that could be just my lack of knowledge on IPv6

fe80:2216:3eff:fe94 matches
fe80::216:3eff:fe94:c8a9 doesn't match

I am presuming that is expected, other than that, looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Review required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants