Support for dynamic prefix field based s3 directory separation #44

elyscape · 2015-10-28T21:31:25Z

Based on #33, but rebased and with author's email set to the personal one listed on his GitHub profile, which is likely the address @nigoel used to sign the CLA.

From #33's description:

One may not provide the dynamic prefixes like "logs/%{app}/%{type}/" . Fields can be extracted from the event messages.

This will create the temp file locally per prefix and apply a time_file watch on each prefix. Change also maintains a separate file lock for each prefix.

After no_event_wait time, it will reset the watch and do a local clean-up for the files under a prefix. This will happen if there has been no event for that prefix, in last no_event_wait * time_file time from the last event.

Closes #33.

…efix changes

nigoel · 2015-10-29T09:34:51Z

Thanks @elyscape . I was struggling with the CLA. Thanks for updating it.

sijis · 2015-11-01T16:35:43Z

I'm really looking forward to this.

elasticsearch-release · 2015-11-02T11:26:25Z

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

suyograo · 2015-11-04T23:38:33Z

@elyscape can you please add tests to validate these changes?

elyscape · 2015-11-04T23:39:55Z

I'll take a look. @nigoel, if you can help on this some, that would be very appreciated.

ph · 2015-11-05T01:52:12Z

lib/logstash/outputs/s3.rb

@@ -8,6 +8,7 @@
 require "thread"
 require "tmpdir"
 require "fileutils"
+require 'pathname'


Lets use the same style as the other requires

require "pathname"

ph · 2015-11-05T02:08:40Z

@elyscape @nigoel this will take a bit more time, I need to check if there any thread safety problems.

ph · 2015-11-06T15:53:03Z

lib/logstash/outputs/s3.rb

-    reset_page_counter
-    create_temporary_file
+    #reset_page_counter
+    #create_temporary_file


We can remove this.

sijis · 2015-11-24T21:59:30Z

Is there something I can do to get this moved on? Is it simply fixing the notes provided by @ph?

nigoel · 2015-11-25T03:26:55Z

Thanks for the review. I will incorporate them.

nigoel · 2015-11-26T04:37:31Z

@elyscape @ph I am not able to create a direct pull request to this because of CLA issues.
I have created a pull request with changes incorporated in elyscape repo.@elyscape please help me here.

elyscape · 2015-11-30T20:48:54Z

Sorry for the delay. Grabbed the changes from @nigoel and added them into this PR with fixed authorship.

…puts

NoumanSaleem · 2016-02-04T16:37:13Z

Would love to see this

**Motivation** One of the most requested features was adding a way to add dynamic prefixes using the fieldref syntax for the files on the bucket and also the changes in the pipeline to support shared delegator. The S3 output by nature was always a single threaded writes but had multiples workers to process the upload, the code was threadsafe when used in the concurrency `:single` mode. This PR addresses a few problems and provide shorter and more structured code: - This Plugin now uses the V2 version of the SDK, this make sure we receive the latest updates and changes. - We now uses S3's `upload_file` instead of reading chunks, this method is more efficient and will uses the multipart with threads if the files is too big. - You can now use the `fieldref` syntax in the prefix to dynamically changes the target with the events it receives. - The Upload queue is now a bounded list, this options is necessary to allow back pressure to be communicated back to the pipeline but its configurable by the user. - If the queue is full the plugin will start the upload in the current thread. - The plugin now threadsafe and support the concurrency model `shared` - The rotation strategy can be selected, the recommended is `size_and_time` that will check for both the configured limits (`size` and `time` are also available) - The `restore` option will now use a separate threadpool with an unbounded queue - The `restore` option will not block the launch of logstash and will uses less resources than the real time path - The plugin now uses `multi_receive_encode`, this will optimize the writes to the files - rotate operation are now batched to reduce the number of IO calls. - Empty file will not be uploaded by any rotation rotation strategy - We now use Concurrent-Ruby for the implementation of the java executor - If you have finer grain permission on prefixes or want faster boot, you can disable the credentials check with `validate_credentials_on_root_bucket` - The credentials check will no longer fails if we can't delete the file - We now have a full suite of integration test for all the defined rotation Fixes: logstash-plugins#4 logstash-plugins#81 logstash-plugins#44 logstash-plugins#59 logstash-plugins#50

**Motivation** One of the most requested features was adding a way to add dynamic prefixes using the fieldref syntax for the files on the bucket and also the changes in the pipeline to support shared delegator. The S3 output by nature was always a single threaded writes but had multiples workers to process the upload, the code was threadsafe when used in the concurrency `:single` mode. This PR addresses a few problems and provide shorter and more structured code: - This Plugin now uses the V2 version of the SDK, this make sure we receive the latest updates and changes. - We now uses S3's `upload_file` instead of reading chunks, this method is more efficient and will uses the multipart with threads if the files is too big. - You can now use the `fieldref` syntax in the prefix to dynamically changes the target with the events it receives. - The Upload queue is now a bounded list, this options is necessary to allow back pressure to be communicated back to the pipeline but its configurable by the user. - If the queue is full the plugin will start the upload in the current thread. - The plugin now threadsafe and support the concurrency model `shared` - The rotation strategy can be selected, the recommended is `size_and_time` that will check for both the configured limits (`size` and `time` are also available) - The `restore` option will now use a separate threadpool with an unbounded queue - The `restore` option will not block the launch of logstash and will uses less resources than the real time path - The plugin now uses `multi_receive_encode`, this will optimize the writes to the files - rotate operation are now batched to reduce the number of IO calls. - Empty file will not be uploaded by any rotation rotation strategy - We now use Concurrent-Ruby for the implementation of the java executor - If you have finer grain permission on prefixes or want faster boot, you can disable the credentials check with `validate_credentials_on_root_bucket` - The credentials check will no longer fails if we can't delete the file - We now have a full suite of integration test for all the defined rotation Fixes: #4 #81 #44 #59 #50 Fixes #102

nigoel added 2 commits October 28, 2015 14:28

Support for dynamic field values in prefix

95bf768

Fixing the issue with deletion of temp file along with the dynamic pr…

b6d430d

…efix changes

suyograo added needs tests enhancement labels Nov 4, 2015

suyograo assigned ph Nov 4, 2015

ph reviewed Nov 5, 2015
View reviewed changes

ph reviewed Nov 6, 2015
View reviewed changes

lib/logstash/outputs/s3.rb

reset_page_counter

create_temporary_file

#reset_page_counter

#create_temporary_file

Copy link

Contributor

ph Nov 6, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this.

Incorporating the review comments for the dynamic prefix based s3 out…

b127992

…puts

elyscape force-pushed the dynamic-prefixes branch from e85712a to b127992 Compare November 30, 2015 20:50

jsvd added the P3 label Apr 26, 2016

adamvduke mentioned this pull request May 31, 2016

Dynamic Prefix: Support configuring uploaded file paths/names within a bucket #4

Closed

ph mentioned this pull request Sep 16, 2016

Refactoring, Dynamic prefix and AWS v2 #102

Closed

roaksoax added the status:needs-triage label Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for dynamic prefix field based s3 directory separation #44

Support for dynamic prefix field based s3 directory separation #44

elyscape commented Oct 28, 2015

nigoel commented Oct 29, 2015

sijis commented Nov 1, 2015

elasticsearch-release commented Nov 2, 2015

suyograo commented Nov 4, 2015

elyscape commented Nov 4, 2015

ph Nov 5, 2015

ph commented Nov 5, 2015

ph Nov 6, 2015

sijis commented Nov 24, 2015

nigoel commented Nov 25, 2015

nigoel commented Nov 26, 2015

elyscape commented Nov 30, 2015

NoumanSaleem commented Feb 4, 2016

Support for dynamic prefix field based s3 directory separation #44

Are you sure you want to change the base?

Support for dynamic prefix field based s3 directory separation #44

Conversation

elyscape commented Oct 28, 2015

nigoel commented Oct 29, 2015

sijis commented Nov 1, 2015

elasticsearch-release commented Nov 2, 2015

suyograo commented Nov 4, 2015

elyscape commented Nov 4, 2015

ph Nov 5, 2015

Choose a reason for hiding this comment

ph commented Nov 5, 2015

ph Nov 6, 2015

Choose a reason for hiding this comment

sijis commented Nov 24, 2015

nigoel commented Nov 25, 2015

nigoel commented Nov 26, 2015

elyscape commented Nov 30, 2015

NoumanSaleem commented Feb 4, 2016