Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(file source): Allow file pattern with curly braces in include directive #5927

Closed
wants to merge 14 commits into from

Conversation

pablosichert
Copy link
Contributor

@pablosichert pablosichert commented Jan 7, 2021

Resolves #5003.

This PR makes it possible to use glob patterns with curly braces in the file source, e.g. {foo,bar}.txt.

@pablosichert pablosichert force-pushed the pablosichert/file-pattern-curly-braces branch 3 times, most recently from 4a95559 to 178f889 Compare January 8, 2021 02:00
@pablosichert
Copy link
Contributor Author

Things I noticed while implementing / follow-up issues:

  • The exclude_patterns do not support curly-brace patterns, because they rely on glob::Pattern. However, the globwalk crate itself doesn't provide any means to just pattern match without walking the FS.
  • Additional options provided by the globwalk builder could be propagated to the source config options. E.g. follow_links or case_insensitive would be useful.

@binarylogic
Copy link
Contributor

Nice work @pablosichert! Would you mind opening issues for both of those points? That helps us decide if we should defer or schedule them for work.

@pablosichert
Copy link
Contributor Author

Thanks! Done.

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a pretty good and straightforward move to make this happen. I do have a few concerns I left inline below.

benches/files.rs Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/tests/glob.rs Outdated Show resolved Hide resolved
src/sources/file.rs Show resolved Hide resolved
@pablosichert pablosichert force-pushed the pablosichert/file-pattern-curly-braces branch from 04e7ae9 to 3954aed Compare January 10, 2021 07:22
@pablosichert pablosichert requested a review from bruceg January 10, 2021 07:24
Copy link
Contributor

@ktff ktff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just the few enhancements that can be done on errors, otherwise this looks good.

lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/file_server.rs Outdated Show resolved Hide resolved
lib/file-source/src/file_server.rs Outdated Show resolved Hide resolved
@ktff
Copy link
Contributor

ktff commented Jan 10, 2021

Failure of tests::api_graphql_files_source_metrics seams to be relevant.

@pablosichert
Copy link
Contributor Author

@ktff good catch.

I found out that globwalk::glob neither accepts patterns that start with ./ nor ones that point directly to a file with an absolute path. Looking into how to resolve that.

@pablosichert
Copy link
Contributor Author

That exact bug/behavior is mentioned in Gilnaa/globwalk#28.

Looks like we don't have an easy path out here.

lib/file-source/src/file_server.rs Show resolved Hide resolved
lib/file-source/src/file_server.rs Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/src/paths_provider/glob.rs Outdated Show resolved Hide resolved
lib/file-source/tests/glob.rs Show resolved Hide resolved
@ktff
Copy link
Contributor

ktff commented Jan 12, 2021

I found out that globwalk::glob neither accepts patterns that start with ./ nor ones that point directly to a file with an absolute path.

That is quite problematic since it will break configurations that rely on it, so this can't be merged until we figure something out or rethink our approach here.

I think helping to implement wanted functionalities in either globwalk or glob is a good option.

@pablosichert pablosichert requested a review from a team January 16, 2021 19:22
@@ -27,7 +27,9 @@ pub mod glob;
pub trait PathsProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the design choices for the path provider was that it's never supposed to push it's error to the caller while enumerating the paths.

My thought process was this.
If the paths provider errors out - how would the caller - file server handle it? It is a meaningful error condition to recognize and act upon there? I deemed not, and decided that there's no good purpose to allow errors to be pushed to the file server. If the error is temporary - the valid behaviour would be to just log and return an empty iter. If the error is critial, and we should terminate the file server - then we should pass the error, however that situation just couldn't occur at the time.
There is another way I can think of that file server can react to an error - and that is, upon receiving a certain kind of error - just skip the current update, and keep using the paths obtained from the previous iterations. It may be useful for some kinds of use cases, like when the paths provider suddenly became unavailable, but we would rather use stale data and hope for the path provider to become available again at next call than terminate the file server. That,s however, also wasn't useful at the time.

The situation may have changed, and the change might be justified, but I'd like to validate that this way of handling error is what we're looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That for the rationale. Also don't have a clear cut answer to this, but my thoughts were the following:

It is a meaningful error condition to recognize and act upon there?

This information can be encoded in/inferred from the associated Error type. In the case of the Glob PathProvider and GlobError, the caller can decide that e.g. InvalidIncludePattern is unrecoverable because it is a programmer error, whereas a WalkError can be temporary (due to the directory structure it is executed in), and ignored.

I would even argue that the caller has more information available if the error is recoverable or not, rather than deciding it in the paths implementation. With this change, the caller can still decide to ignore all errors, but the inverse is not true. If the paths implementation decides to unwrap and panic, the caller has little chance to override this behavior.

Copy link
Contributor

@MOZGIII MOZGIII Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information can be encoded in/inferred from the associated Error type.

Correct. The caller can't be coupled with an exact error type to carry this information though - otherwise it won't be generic. To establish a generic protocol we either have to add a trait to the error type, or use a trait-owned wrapper type to encode this data - i.e. Result<..., PathsError<Self::Error>>, where enum PathsError<T> { UsePreviousPaths(T), Fatal(T) }.

Yet another question is how much sense does this protocol extension make. In other words - do we need the caller need to be able to decide to ignore the errors, etc? I mean, currently, we're just doing ? there. Do we need to further extend this protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. We don't specialize the PathProvider yet in the FileServer::run function.

i.e. Result<..., PathsErrorSelf::Error>, where enum PathsError { UsePreviousPaths(T), Fatal(T) }.

It makes the trait a bit more verbose, but it does exactly solve the problem we are discussing. I'm leaning for it.

Comment on lines +98 to +100
let glob = glob(include_pattern).with_context(|| InvalidIncludePattern {
glob: include_pattern.to_owned(),
})?;
Copy link
Contributor

@MOZGIII MOZGIII Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to https://github.com/timberio/vector/pull/5927/files#r559039229, and since we're validating that the include patterns are valid at construction, it might be meaningful to assume they're valid here and elide the possibility of an error via an unwrap.
If I understand correctly, once we've validated the patterns above, it is not possible that they will actually return errors here?

We might want to introduce a dedicated type wrapper that would contain the new(patterns) -> Result<Self, ...> and build(&self) -> GlobWalker to better capture this invariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, either unwrapping or parsing into a more approriate type would be the ideal behavior here.

Let's see what we can do here after the changes to glob or globwalk.

match result {
Ok(()) => {}
Err(error) => {
error!(message = "Output channel closed.", error = ?error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message should probably be preserved.

@jszwedko
Copy link
Member

It seems like this is still blocked by lack of support upstream in glob. I'll close this out as stale, but leave the branch so we can revisit it later if things change.

@jszwedko jszwedko closed this Jun 23, 2021
@pablosichert pablosichert deleted the pablosichert/file-pattern-curly-braces branch January 6, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Curly braces in the include directive, file pattern
6 participants