[pkg/stanza/fileconsumer] Add ability to read files asynchronously #25884

VihasMakwana · 2023-08-18T12:42:35Z

Description: Added a new feature gate that enables a thread pool mechanism to respect the poll_interval parameter.

Current Scenario:

If a file takes longer than poll_interval to consume, the current implementation would block until it consumes entirely. In other words, it doesn't respect poll_interval.
Improvisation using thread pooling:

In a thread pool model, the backend will queue the files as it proceeds and won't wait for them to consume, all the reading will be asynchronous.

Link to tracking Issue: #18908

Testing: Nothing new added, existing ones are modified as per the feature gate

…ature

VihasMakwana · 2023-08-23T17:56:00Z

@djaglowski PR is up for review ;)

pkg/stanza/fileconsumer/file_threadpool.go

djaglowski

Thanks @VihasMakwana. This is my initial pass at the review. I'm still trying to wrap my head around some aspects of this implementation but I wanted to leave initial thoughts.

I mentioned here that I think it's important for us to have benchmarks in our codebase that demonstrate the improvement we want to see from the change, but we also need to have some confidence that we're not meaningfully regressing in other cases such as those where files are typically about the same size.

pkg/stanza/fileconsumer/file_threadpool.go

djaglowski · 2023-08-28T14:13:07Z

pkg/stanza/fileconsumer/file_threadpool.go

+func (m *Manager) isCurrentlyConsuming(fp *fingerprint.Fingerprint) bool {
+	m.trieLock.RLock()
+	defer m.trieLock.RUnlock()
+	return m.trie.HasKey(fp.FirstBytes)
+}
+
+func (m *Manager) removePath(fp *fingerprint.Fingerprint) {
+	m.trieLock.Lock()
+	defer m.trieLock.Unlock()
+	m.trie.Delete(fp.FirstBytes)
+}


Should we just move the locking into the trie?

Well, we can. But I didn't go with it, as I was planning to add Trie in the current model (synchronous) as well in the future.

That's a whole different thing, we can discuss it in a different issue.

pkg/stanza/fileconsumer/file_threadpool.go

pkg/stanza/fileconsumer/util_test.go

pkg/stanza/fileconsumer/file_threadpool.go

VihasMakwana · 2023-09-04T13:47:24Z

@djaglowski I have added the benchmark test case.

djaglowski · 2023-09-07T12:07:11Z

pkg/stanza/fileconsumer/benchmark_test.go

+type fileSizeBenchmark struct {
+	sizes [2]int
+}


Can we generalize this a bit more so we can define multiple scenarios very easily? Something like:

testCases := []fileSizeBenchmark{ { name: "varying_sizes", sizes: map[int]int { 10: 25, 5000: 25 }, maxConcurrent: 50, }, { name: "same_size_small", sizes: map[int]int { 10: 100 }, maxConcurrent: 50, }, { name: "same_size_small_throttled", sizes: map[int]int { 10: 100 }, maxConcurrent: 10, }, { name: "same_size_large", sizes: map[int]int { 5000: 50 }, maxConcurrent: 50, }, { name: "same_size_large", sizes: map[int]int { 5000: 50 }, maxConcurrent: 5, }, }

djaglowski · 2023-09-07T12:13:49Z

pkg/stanza/fileconsumer/benchmark_test.go

+
+func BenchmarkFileSizeVarying(b *testing.B) {
+	fileSize := fileSizeBenchmark{
+		sizes: [2]int{b.N * 5000, b.N * 10}, // Half the files will be huge, other half will be smaller


I'm not convinced it makes sense to scale the size of files. A benchmark should be able to characterize a specific task.

The way this is defined makes the task somewhat ambiguous and doesn't actually exercise the critical functionality very much. (i.e. only calls poll twice)

I think we should use fixed file sizes and have b.N represent the number of times we add a set of new logs and call poll.

I think I have misspelled sizes.
those are actually number of logs added to files.
if those two are number of logs, as you said, is that fine by you?

I probably need to rename it to logs or something.

@djaglowski I have made number of logs configurable and have added some more params to better scale this tests.

pkg/stanza/fileconsumer/file_threadpool.go

djaglowski · 2023-09-07T12:29:30Z

pkg/stanza/fileconsumer/file_threadpool.go

+	"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/fingerprint"
+)
+
+type readerWrapper struct {


Would this be a little more readable as readRequest or something else more descriptive about its purpose?

readerEnvelope loops catchy tbh. I have such namings in some client libraries that utilize channels.
Sounds fine to you? We can do either of them tbh

pkg/stanza/fileconsumer/file_threadpool.go

djaglowski · 2023-09-07T12:36:42Z

pkg/stanza/fileconsumer/rotation_test.go

+	operator.poll(context.Background())
+	operator.poll(context.Background())


Does this test fail without these?

yeah, they do. As per the current implementation, roller.readLostFiles works on per consume calls, not on per poll calls. So, per every batch, we call readLostFiles for the previous batch. All this is done in one single poll().

However, for the new implementation, we need to determine if it's an old reader or not.
It's done on a generation basis https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25884/files#diff-a918db427e06075a5fa059d7893f54556bfa1504726ba77b3491b644a64ab818R163.
It will only detect an old reader if it's 3 poll cycles old ;(

so, you need 3 poll() calls.

pkg/stanza/fileconsumer/file_threadpool.go

djaglowski · 2023-09-07T14:04:59Z

pkg/stanza/fileconsumer/file_threadpool.go

+	for _, r := range oldReaders {
+		if m.isCurrentlyConsuming(r.Fingerprint) {
+			r.Close()
+		} else {


What if we are not currently reading the file, but we did read the file again since this reader was created?

This is actually somewhat expected because we'll often see the same file in each poll interval. By the time we're discarding these old readers, we're probably read that file a couple more times.

Is there any chance that telling the old reader to read is going to start from an out of date offset?

In the current implementation, we only read again if we've determined that the file has been moved out of the matcher path. This implementation just tries to read regardless.

This implementation also reads it after we've determined that it's been moved out of the path based on generation. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25884/files#diff-a918db427e06075a5fa059d7893f54556bfa1504726ba77b3491b644a64ab818R163

What if we are not currently reading the file, but we did read the file again since this reader was created?

I don't think so if that's possible. Because m.clearOldReadersConcurrent(ctx) updates the knownFiles and clears out old readers before the readers for current poll cycles are created.

The current poll() will begin only when the previous poll cycle has submitted all the readers to the thread pool and the trie will be up to date.
So, there's no chance that we'd add any new file to read between the previous poll() and current clearOldReadersConcurrent().

I don't think so if that's possible. Because m.clearOldReadersConcurrent(ctx) updates the knownFiles and clears out old readers before the readers for current poll cycles are created.

But it only removes readers from 3 poll cycles ago, so readers from the previous cycle are still there.

I don't think so if that's possible. Because m.clearOldReadersConcurrent(ctx) updates the knownFiles and clears out old readers before the readers for current poll cycles are created.

But it only removes readers from 3 poll cycles ago, so readers from the previous cycle are still there.

I still think this is a problem. When we ReadToEnd as the file is discarded, it will start from an out of date offset. This will cause duplicate ingestion of data.

Dan, if you don't mind, can you give me a walk through with an example? I think we'll be able to sort it out fairly quickly then.

Sure, I'll try to be concise so hopefully this is clear enough.

Let's make the situation unrealistically simple and say that one line is written to a file in between each poll cycle. At each time, we create a reader that reads to the end of the file. Each of these is a reader for the same file, but they each belong to a different generation and have a different offset.

Time Actual Content Offset After Reading

t0 foo0\n 5

t1 foo0\nfoo1\n 10

t2 foo0\nfoo1\nfoo2\n 15

t3 foo0\nfoo1\nfoo2\nfoo3\n 20

So at t3, when we decide to discard the reader from t0, if we ReadToEnd on that reader, we will read foo1\n, foo2\n... again, even though foo1\n was already read at t1, foo2\n was read at t2, etc.

I see. What I can do is, I will create a test case which simulates this and check it out. Should be simple.
Thanks 😊

With a tweak in logic and respecting the max_concurrent_files, we don't hold readers open for 3 poll() cycles anymore. So, I think we're good now.

To discuss the above-mentioned scenario, we remove a reader from m.knownFiles if we detect a match here.
Also, while copying a reader, we don't copy the generation.
So, every time a reader is copied, we remove the old reader from m.knownFiles and create a fresh reader with generation set to 0.

The scenario would look something like this with knownFiles:

Time Actual Content Offset After Reading knownFiles (before poll()) knownFiles (after poll())

t0 foo0\n 5 [] [reader{offset: 5, generation: 0}]

t1 foo0\nfoo1\n 10 [reader{offset: 5, generation: 0}] [reader{offset: 10 generation: 0}]

t2 foo0\nfoo1\nfoo2\n 15 [reader{offset: 15, generation: 0}] [reader{offset: 15, generation: 0}]

t3 foo0\nfoo1\nfoo2\nfoo3\n 20 [reader{offset: 20, generation: 0}] [reader{offset: 20 generation: 0}]

Note: It is assumed that we finish reading asynchronously before the next poll() is executed.

Co-authored-by: Daniel Jaglowski <[email protected]>

djaglowski

I don't think this implementation is respecting max_concurrent_files. It uses this setting to create the number of workers, which represents the number of files that can simultaneously be read. However, we have 3 generations of files sitting open in knownFiles without any regard for this setting. The setting is intended to maximize the number of open files, not the number of workers / actively read files.

VihasMakwana · 2023-09-15T16:23:39Z

Okay, I got your point.
How about we keep max_concurrent_files / 3 as number of workers? or max_concurrent_files/2 like the current implementation.

VihasMakwana · 2023-09-15T17:55:50Z

By the way, do you think reducing the lost reader's generation check from 3 cycles to 2 cycles makes sense to you? It might, because if a file is unavailable for 2 straight polls(), it is unlikely it will available in 3rd.

djaglowski · 2023-09-15T18:09:02Z

By the way, do you think reducing the lost reader's generation check from 3 cycles to 2 cycles makes sense to you? It might, because if a file is unavailable for 2 straight polls(), it is unlikely it will available in 3rd.

Probably, but I recall experiencing increased instability when dropping this in the past. Ultimately, I am hopeful we will find a design that gets away from this notion entirely.

VihasMakwana · 2023-09-16T00:07:16Z

@djaglowski I have re-introduced batching in this implementation. Respecting max_concurrent_files skipped my mind, kind apologies.

I have introduced a new helper to take care of copy-truncate duplications here.

I have also tweaked the logic to clear old readers to respect the max_concurrent_files.

At the beginning of the batch, we close all the knownFiles after reading them one last time.
This will make sure that there are atmost max_concurrent_files open files.

VihasMakwana · 2023-09-16T00:08:30Z

In the meantime, I'm working to reduce redundancy in test cases.

VihasMakwana · 2023-09-17T00:03:16Z

pkg/stanza/fileconsumer/file_threadpool.go

+}
+
+func (m *Manager) clearOldReadersConcurrent(ctx context.Context) {
+	m.knownFilesLock.Lock()


@djaglowski I have made clearOldReadersConcurrent() a kind of blocking call based on the observations made while running benchmarks.
We wait for all aggregated readers to finish reading before moving forward.

Observations:

If the number of files matched is significantly higher than maxConcurrentFiles then we are likely going to batch the files more than a couple of times.

This causes workers() to overwhelm and they'll all be busy.

If we try to send already finished readers to the workers() anyway, it will cause a bottleneck, and benchmarks show a significant delay in receiving logs.

It's better to create separate goroutines to process finished readers and let the workers() loosen themselves up for the upcoming poll() cycle.

In this way, we respect max_concurrent_files, workers() are happy, all's good. The benchmark performs as expected.

VihasMakwana · 2023-09-21T17:45:05Z

@djaglowski does the overall PR look stable to you, as it respects current parameters and passes all the test cases?
If yes, I will shift my focus to fixing the pipeline now. Open to any thoughts/opinions as always.

VihasMakwana · 2023-09-27T18:21:09Z

@djaglowski ping ^

djaglowski

@VihasMakwana, thanks for iterating on this. It seems like the design has changed quite a bit and I'm struggling to wrap my head around all the nuances. As much as I'd like to see this move forward, I'm uncomfortable with how fluid the design seems to be and I lack confidence that we're making the right changes. I think a large part of the problem is that we're having to rework several overlapping concerns at the same time. Do you see any opportunities to separate concerns into individual PRs?

VihasMakwana · 2023-09-28T18:11:54Z

Okay. On a quick note, I do think making individual PRs will help moving forward.

I can think of following:

a PR for file threadpool ability. (a minimalist design, we will expand it later on once we are confident enough). I think we should have it as an internal package?
a PR for complete functionality, after 1st PR is merged and we're confident to move forward.
Test cases, enabling the feature gate, integrating with existing feature.

djaglowski · 2023-09-28T18:27:38Z

a PR for file threadpool ability. (a minimalist design, we will expand it later on once we are confident enough). I think we should have it as an internal package?

I'm open to this, though I'm not sure I understand exactly what you have in mind.

More generally, I think what we need is not necessary a direct path to the design used in this PR, but rather a path that changes the way we manage other concerns in a way that this PR does not need to worry about them.

For example, we have all these different sets of files, most of which don't necessarily need to be a particular concern of an asynchronous solution.

Files we have found which need to be opened, fingerprinted and cross-referenced with files we've previously seen.
Files we are currently reading.
Files we recently read and have kept open in case they become "lost".
Files we are reading again, because they became "lost".
Files we have already read and closed, but have kept around so we know where to offset to if we see them again.

In other words, can we redesign the way we manage any of these in such a way that it is agnostic to the implementation of the others?

VihasMakwana · 2023-09-28T18:39:17Z

Okay. If I understand this correctly, what we require is a path of restructuring the PR which addresses individual concerns independently, without relying on others as you said.
If that's true, then I can work on it. Should be possible.

github-actions · 2023-10-13T05:20:51Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2023-10-28T05:19:24Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Vihas Splunk added 4 commits August 18, 2023 17:57

chore: add new feature gate, use locking

dbc668d

fix: remove unnecessary

b991137

Merge remote-tracking branch 'origin/main' into filelog_threadpool_fe…

9b45d03

…ature

fix: error

4e5dead

github-actions bot added pkg/stanza receiver/filelog labels Aug 18, 2023

github-actions bot requested a review from djaglowski August 18, 2023 12:42

VihasMakwana force-pushed the filelog_threadpool_feature branch 2 times, most recently from 84a9adc to 38a697d Compare August 18, 2023 12:46

chore: add changelog

1e3c11e

VihasMakwana force-pushed the filelog_threadpool_feature branch from 38a697d to 1e3c11e Compare August 18, 2023 12:49

fix: refactor

93dc34a

VihasMakwana force-pushed the filelog_threadpool_feature branch from ca1d7a2 to 93dc34a Compare August 18, 2023 12:59

Vihas Splunk added 2 commits August 18, 2023 18:36

fix: test case failure

2baec2e

fix spacing

c364493

VihasMakwana marked this pull request as ready for review August 22, 2023 13:54

VihasMakwana requested a review from a team August 22, 2023 13:54

github-actions bot assigned TylerHelmuth Aug 22, 2023

VihasMakwana commented Aug 23, 2023

View reviewed changes

pkg/stanza/fileconsumer/file_threadpool.go Show resolved Hide resolved

djaglowski reviewed Aug 28, 2023

View reviewed changes

Vihas Splunk added 3 commits September 1, 2023 12:51

fix: make clearOldReaders non-blocking

879493c

add benchmark

bf39c05

add uuid

7bd826c

VihasMakwana requested a review from djaglowski September 4, 2023 13:46

VihasMakwana and others added 2 commits September 4, 2023 19:17

Merge branch 'main' into filelog_threadpool_feature

4ec5086

add comments

28979bb

djaglowski reviewed Sep 7, 2023

View reviewed changes

Update pkg/stanza/fileconsumer/file_threadpool.go

89609a6

Co-authored-by: Daniel Jaglowski <[email protected]>

djaglowski reviewed Sep 15, 2023

View reviewed changes

VihasMakwana closed this Sep 15, 2023

VihasMakwana reopened this Sep 15, 2023

VihasMakwana requested a review from djaglowski September 16, 2023 00:08

VihasMakwana force-pushed the filelog_threadpool_feature branch from 0de6126 to cf0ad9c Compare September 16, 2023 00:19

fix: respect max_concurrent and fix copy truncate

89bb07a

VihasMakwana force-pushed the filelog_threadpool_feature branch 2 times, most recently from 13b8064 to 0752cc3 Compare September 16, 2023 23:48

fix: reduce redundancy test cases

cae3290

VihasMakwana force-pushed the filelog_threadpool_feature branch from 0752cc3 to cae3290 Compare September 16, 2023 23:50

VihasMakwana commented Sep 17, 2023

View reviewed changes

Vihas Splunk added 3 commits September 18, 2023 22:11

Merge branch 'main' into filelog_threadpool_feature

8a8cf31

chore: rename functions

506cefa

chore: rework checkTruncate

99e507c

fix: race conditions detected during pipeline run

c822f28

djaglowski reviewed Sep 28, 2023

View reviewed changes

github-actions bot added the Stale label Oct 13, 2023

github-actions bot closed this Oct 28, 2023

		operator.poll(context.Background())
		operator.poll(context.Background())

Time	Actual Content	Offset After Reading
t0	`foo0\n`	5
t1	`foo0\nfoo1\n`	10
t2	`foo0\nfoo1\nfoo2\n`	15
t3	`foo0\nfoo1\nfoo2\nfoo3\n`	20

Time	Actual Content	Offset After Reading	knownFiles (before poll())	knownFiles (after poll())
t0	foo0\n	5	[]	[reader{offset: 5, generation: 0}]
t1	foo0\nfoo1\n	10	[reader{offset: 5, generation: 0}]	[reader{offset: 10 generation: 0}]
t2	foo0\nfoo1\nfoo2\n	15	[reader{offset: 15, generation: 0}]	[reader{offset: 15, generation: 0}]
t3	foo0\nfoo1\nfoo2\nfoo3\n	20	[reader{offset: 20, generation: 0}]	[reader{offset: 20 generation: 0}]

[pkg/stanza/fileconsumer] Add ability to read files asynchronously #25884

[pkg/stanza/fileconsumer] Add ability to read files asynchronously #25884

Conversation

VihasMakwana commented Aug 18, 2023

VihasMakwana commented Aug 23, 2023

djaglowski left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VihasMakwana commented Sep 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VihasMakwana Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djaglowski Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djaglowski left a comment

Choose a reason for hiding this comment

VihasMakwana commented Sep 15, 2023

VihasMakwana commented Sep 15, 2023

djaglowski commented Sep 15, 2023

VihasMakwana commented Sep 16, 2023

VihasMakwana commented Sep 16, 2023

Choose a reason for hiding this comment

VihasMakwana commented Sep 21, 2023

VihasMakwana commented Sep 27, 2023

djaglowski left a comment

Choose a reason for hiding this comment

VihasMakwana commented Sep 28, 2023 • edited Loading

djaglowski commented Sep 28, 2023

VihasMakwana commented Sep 28, 2023

github-actions bot commented Oct 13, 2023

github-actions bot commented Oct 28, 2023

djaglowski left a comment •

edited

Loading

VihasMakwana Sep 7, 2023 •

edited

Loading

djaglowski Sep 15, 2023 •

edited

Loading

VihasMakwana commented Sep 28, 2023 •

edited

Loading