Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(libsinsp): rewrite concatenate_paths with std::filesystem #1533

Merged
merged 10 commits into from
Dec 12, 2023

Conversation

LucaGuerra
Copy link
Contributor

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug
/kind cleanup

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area libsinsp

/area tests

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

This gets rid of a scary function that was operating on path strings by hand and is prone to breakage or (worse) memory error issues and replaces with a native C++ implementation.

@incertum I still have a failing test and I would like to be more certain of when we depend on specific behaviors in libs before I go through. Can you take a look? Thanks!

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@incertum
Copy link
Contributor

incertum commented Dec 2, 2023

@LucaGuerra thank you very much for starting this PR! Continued a bit.

openat* filtercheck was a quick fix.

Then there were issues with cwd that I could replicate locally:

libs/userspace/libsinsp/test/events_proc.ut.cpp:487: Failure
Expected equality of these values:
  get_field_as_string(evt, "proc.cwd")
    Which is: "/tmp/target-directory"
  "/tmp/target-directory/"

Seems to be fixed now as well at least on my fedora box.

EDIT: Re the previous unit test failures because of inconsistent handling of /: I researched a bit more https://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap04.html#tag_04_11 and eventually came to the conclusion that path1 is likely more well behaved as it should either be empty or cwd or sdir in our parsers. Only path2 is user / app supplied and can be wrong (in that case the open* syscall does not succeed of course). I updated the unit tests to reflect this. I hope they are now more realistic. @LucaGuerra mind checking again and do you have more thoughts? Btw this refactor is really good for the project as we outsource the difficult and error prone task of path concatenation and normalization.

@incertum incertum force-pushed the refactor/string-concatenate-paths branch 2 times, most recently from c26a7d6 to 94b1f33 Compare December 2, 2023 19:28
@incertum
Copy link
Contributor

incertum commented Dec 2, 2023

Since we introduced std::filesystem for the newer fs.path.* I compared them:

  • First this implementations' fd.name always matched the old implementation in my manual tests, so we shouldn't be changing semantics and instead the refactored version should now be more robust in odd edge cases and more memory safe since we observed some stack buffer underflows on real servers with the old implementation
  • However, I noticed that fs.path.* can sometimes be wrong (best addressed in another PR)

When the path is relative, everything matches:

cat ..//app/build/../random/falco/

{"evt.res":"ENOENT","evt.type":"openat","fd.name":"/tmp/a/b/c/app/random/falco","fd.nameraw":"../app/build/../random/falco/","fs.path.name":"/tmp/a/b/c/app/random/falco/","fs.path.nameraw":"..//app/build/../random/falco/","proc.cwd":"/tmp/a/b/c/target-dir/"}

But there is a mismatch when the path is absolute:

cat /app/build/../../random/falco/
cat /app/build/../random/falco/

{"evt.res":"ENOENT","evt.type":"openat","fd.name":"/random/falco","fd.nameraw":"/app/build/../../random/falco/","fs.path.name":"/app/build/../../random/falco/","fs.path.nameraw":"/app/build/../../random/falco/","proc.cwd":"/tmp/a/b/c/target-dir/"}
{"evt.res":"ENOENT","evt.type":"openat","fd.name":"/app/random/falco","fd.nameraw":"/app/build/../random/falco/","fs.path.name":"/app/build/../random/falco/","fs.path.nameraw":"/app/build/../random/falco/","proc.cwd":"/tmp/a/b/c/target-dir/"}

CC @mstemm

Copy link
Contributor

@federico-sysdig federico-sysdig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. I'm not approving for the moment as I see it is WIP, but I don't see major issues.

userspace/libsinsp/threadinfo.cpp Outdated Show resolved Hide resolved
uint32_t size = tinfo->m_cwd.size();

if(size == 0 || (tinfo->m_cwd[size - 1] != '/'))
if(len == 0 || (tinfo->m_cwd[len - 1] != '/'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for len.

Suggested change
if(len == 0 || (tinfo->m_cwd[len - 1] != '/'))
if(tinfo->m_cwd.empty() || tinfo->m_cwd.back() != '/'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, after you respond to above's comment I can rewind the last 2 commits.

userspace/libsinsp/utils.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/utils.cpp Show resolved Hide resolved
userspace/libsinsp/event.cpp Outdated Show resolved Hide resolved
@@ -488,12 +487,8 @@ uint8_t *sinsp_filter_check_event::extract_abspath(sinsp_evt *evt, OUT uint32_t
//
// Get the file path directly from the ring buffer.
//
parinfo = evt->get_param(3);
m_strstorage = sinsp_utils::concatenate_paths("", evt->get_param(3)->as<std::string_view>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why concatenate when one item is an empty path? Is it because the function enforces a final slash? If so, wouldn't it be better to put the slash when the path is finally used for its purpose? I know this logic was already there, but it seems a lot of work to check the path at any point it is manipulated when it could be done only once at the final step. End of rant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also does the directory traversal if applicable. We wouldn't need this here? for this one perhaps let's do a 2 step, first get this one in, before risking breaking things and take the time to check more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only curious. I understand this would be a larger change.

@incertum incertum changed the title wip: refactor(libsinsp): rewrite concatenate_paths with std::filesystem refactor(libsinsp): rewrite concatenate_paths with std::filesystem Dec 6, 2023
@incertum
Copy link
Contributor

incertum commented Dec 6, 2023

I think we can at least move it out of wip?

@incertum
Copy link
Contributor

incertum commented Dec 6, 2023

We still have an unrelated windows CI failure.

userspace/libsinsp/event.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/utils.cpp Show resolved Hide resolved
@poiana poiana added size/XXL and removed size/XL labels Dec 10, 2023
@incertum incertum force-pushed the refactor/string-concatenate-paths branch from 12a686e to 62a8f99 Compare December 10, 2023 00:19
EXPECT_EQ("/app", res);

path1 = "/root/";
path2 = "../😉";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix falcosecurity/falco#2958

Additional note: @mstemm fs.path.* also doesn't do any sanitation in case we are interested in equalizing fd.name and fs.path.name in the future.

@LucaGuerra
Copy link
Contributor Author

LucaGuerra commented Dec 11, 2023

Thank you all for the reviews! I'm going through all comments and the Windows + shared libs failure to get this in a mergeable state :)

@leogr leogr added this to the 0.14.0 milestone Dec 11, 2023
@LucaGuerra
Copy link
Contributor Author

LucaGuerra commented Dec 11, 2023

So the Windows failure is kind of expected, and it's a problem.

This is because we are hitting a problem that I did not consider earlier: STL will interpret paths in different ways depending on the platform. This is actually documented: https://en.cppreference.com/w/cpp/filesystem/path . The problem is that Windows has a concept of root-name that will be interpreted even if we are setting generic_format. For instance, //dirname/hello is split into //dirname as root-name and hello as the actual path. This instance will not bother us a lot, but imagine if a Windows-Falco is trying to interpret a perfectly valid unix relative path that starts with c:/, it will think that it's looking at an absolute path instead.

I think this is not acceptable in the current situation, so I thought about three solutions:

  • Instead of using std::filesystem, use an OSS library that is easy to import and has a compatible license which does not present this problem
  • Rewrite the C++ STL path normalization algorithm without the platform differences by hand
  • Revert this change and leave the hand-rolled implementation (I would not like to do it)

I'm thinking about trying to find a simple library that can solve our issues without having to have a custom algorithm here.

@incertum
Copy link
Contributor

So the Windows failure is kind of expected, and it's a problem.

This is because we are hitting a problem that I did not consider earlier: STL will interpret paths in different ways depending on the platform. This is actually documented: https://en.cppreference.com/w/cpp/filesystem/path . The problem is that Windows has a concept of root-name that will be interpreted even if we are setting generic_format. For instance, //dirname/hello is split into //dirname as root-name and hello as the actual path. This instance will not bother us a lot, but imagine if a Windows-Falco is trying to interpret a perfectly valid unix relative path that starts with c:/, it will think that it's looking at an absolute path instead.

I think this is not acceptable in the current situation, so I thought about three solutions:

* Instead of using `std::filesystem`, use an OSS library that is easy to import and has a compatible license which does not present this problem

* Rewrite the C++ STL path normalization algorithm without the platform differences by hand

* Revert this change and leave the hand-rolled implementation (I would not like to do it)

I'm thinking about trying to find a simple library that can solve our issues without having to have a custom algorithm here.

While I honor us supporting multi-platform for scap files and plugins, I don't think this should sacrifice the primary Linux syscalls monitoring, plus we already use std::filesystem for fs.path*. Instead add a workaround ifdef for windows I would say.

@LucaGuerra
Copy link
Contributor Author

Since we are already using std::filesystem we can probably also implement a workaround for the edge cases we are thinking about. Not thrilled but I would rather use STL over third-party components.

@LucaGuerra LucaGuerra force-pushed the refactor/string-concatenate-paths branch from 299af28 to 3eae1ad Compare December 12, 2023 12:44
@LucaGuerra LucaGuerra force-pushed the refactor/string-concatenate-paths branch from 3eae1ad to b02351f Compare December 12, 2023 12:50
@LucaGuerra LucaGuerra force-pushed the refactor/string-concatenate-paths branch from b02351f to 217bcee Compare December 12, 2023 13:10
@LucaGuerra
Copy link
Contributor Author

@incertum @federico-sysdig I ended up implementing the workaround. All tests are passing including the corner cases we identified. It's not the best but it will do. Also, the fs.* function now points to this implementation since it was doing the same thing.

@incertum
Copy link
Contributor

@incertum @federico-sysdig I ended up implementing the workaround. All tests are passing including the corner cases we identified. It's not the best but it will do. Also, the fs.* function now points to this implementation since it was doing the same thing.

Thanks a bunch @LucaGuerra LGTM! @federico-sysdig could we get another review pass from you, ty! 🙏

Copy link
Contributor

@federico-sysdig federico-sysdig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just nits that can be read, understood, and ignored

@@ -4470,14 +4447,12 @@ void sinsp_parser::parse_getcwd_exit(sinsp_evt *evt)
return;
}

parinfo = evt->get_param(1);
std::string cwd = std::string(evt->get_param(1)->as<std::string_view>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the type name need not be repeated. Just noticed, but it can be left as it is.

		std::string cwd(evt->get_param(1)->as<std::string_view>());

@@ -232,30 +230,21 @@ bool sinsp_filter_check_fd::extract_fdname_from_creator(sinsp_evt *evt, OUT uint
}

parinfo = etype == PPME_SYSCALL_OPENAT_X ? enter_evt.get_param(1) : evt->get_param(2);
name = parinfo->m_val;
namelen = parinfo->m_len;
std::string_view name = parinfo->as<std::string_view>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps we can avoid repeating the type in this kind of definitions as it is already on the RHS. Just a thought for the future. Leave it as it is for the moment.

			auto name = parinfo->as<std::string_view>();

@poiana
Copy link
Contributor

poiana commented Dec 12, 2023

@federico-sysdig: changing LGTM is restricted to collaborators

In response to this:

lgtm, just nits that can be read, understood, and ignored

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I also contributed my

/approve

is on behalf of @federico-sysdig, thanks a bunch for your help!

@poiana
Copy link
Contributor

poiana commented Dec 12, 2023

LGTM label has been added.

Git tree hash: 0765025267b638a9aa0f633d3a07676a22268a64

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Dec 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, federico-sysdig, incertum, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,LucaGuerra,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants