SerializedBufferingStrategy issues and optimizations #32740
robertomczak
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Topic
SerializedBufferingStrategy might be causing issues in some cases.
Relevant information
Hello Airbyte community.
Currently we are running Airbyte deployed on EKS via HELM. This Airbyte deployment is handling MySQL -> S3, PostgreSQL ->, Amplitude -> S3 and few other connections.
Note: Each MySQL has two DBs being sourced, one with 30 tables, second with only 4.
During operations of those connections we've observed that schema containing 30 tables (CDC replicated), is having various issues, that includes:
Possible root cause of issues:
"This issue might be caused by a long pause of TaskManager, such as a prolonged full GC, at which time the MySQL server thinks the client is not responsive so it stops transmitting binlog to the client, resulting in the EOFException."
Which in this case might be linked to frequent flushAllBuffers.
Doesn't look like connection or binlog corruption issue as other schema(using same binlog) is synchronizing correctly.
Possible solutions:
One of possible solutions is to expose MAX_CONCURRENT_STREAM_IN_BUFFER (DEFAULT_MAX_CONCURRENT_STREAM_IN_BUFFER) to eg. env variables. This will allow to tweak how many concurrent streams worker can handle without calling flushAllBuffers frequently.
Optimize stream assigment, eg. split streams belonging to single connections to multiple workers based on
MAX_CONCURRENT_STREAM_IN_BUFFER setting.
Beta Was this translation helpful? Give feedback.
All reactions