Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java heapspace on source connector #51616

Open
scottmorgan-payroc opened this issue Jan 17, 2025 · 0 comments
Open

Java heapspace on source connector #51616

scottmorgan-payroc opened this issue Jan 17, 2025 · 0 comments
Labels
area/abctl Issues with the abctl quickstart cli cdc community connectors/source/mssql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@scottmorgan-payroc
Copy link

What happened?

Backstory of our scenario: The nature of the month end process that affects the production table we are attempting to use airbyte to sync is the source table is ~1billion rows. Airbyte is able to sync this table from scratch without any issues. During the month there are daily updates,inserts and deletes at about 7 -10 million or so on average which work successfully. Once a month there is a process which inserts/updates/deletes potentially 70 - 100 million of rows. Airbyte cdc falls over once we hit the 70millionish level of change in a single run. Is there a block diagram showing the pods required and cluster resources utilized during the cdc process. There is plenty of ram and disk available from our monitoring telemetry but seems to be a configuration gap for the one connection we are blocked on.

The pattern is always the same the connection 7eaf693f-4cbe-4e21-83d8-94e2357bb9f4 tries to perform cdc on one of our tables but fails with javas heap space error. the log indicates only 10mb being used despite setting a large max memory. we are using the latest version of abctl. Not sure how the source connector java heap can fall over when it is only using 10MB with server that shows having plenty of free ram available > 10GB ram free.

The connection level config we have set is via the db using psql -U airbyte -d db-airbyte -t -A -c "UPDATE connection SET resource_requirements = '{"cpu_limit": "1.5", "cpu_request": "1.0", "memory_limit": "20Gi", "memory_request": "13Gi"}' WHERE id = '7eaf693f-4cbe-4e21-83d8-94e2357bb9f4';"

SQL Server 2019 cu30
I've configured values.yaml javaopts and connection specific memory parameters but still run into java heapspace error when processing large sql server cdc syncs. Airbyte is able to initialize a transaction table with a billion rows but fails on a 70 million cdc sync. 7 million row cdc sync works

vaklues.yaml
global:
edition: "community"
jobs:
resources:
limits:
cpu: 1000m
memory: 2Gi ## e.g. 500m
requests:
cpu: 250m
memory: 1Gi
env_vars:
HTTP_IDLE_TIMEOUT: 1800s
DEBEZIUM_MAX_QUEUE_SIZE_IN_BYTES: 536870912
#LOG_LEVEL: DEBUG
CDC_LOG_LEVEL: DEBUG
#DEBEZIUM_LOG_LEVEL: DEBUG
MSSQL_CDC_LOG_LEVEL: DEBUG
JOB_MAIN_CONTAINER_MEMORY_REQUEST: 1Gi
JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8Gi
NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST: 1Gi
NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8Gi
JAVA_OPTS: "-XX:+ExitOnOutOfMemoryError -XX:MaxRAMPercentage=80.0 -XX:+UseG1GC"

Image

Image

Image

What did you expect to happen?

airbyte to sync using cdc without error. and that pods would not run out of memeory given we have set them to have up to 10+gig max batch size on sql server connection is set at 1000 rows

Abctl Version

$ abctl version
version: v0.24.0

Docker Version

$ docker version
Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1

</details>


### OS Version

<details>

```console
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/abctl Issues with the abctl quickstart cli cdc community connectors/source/mssql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants