[BUG] SCDF not available due to skipper crash #1067
Labels
bug
Something isn't working
CCB
Issue for CCB
ops
Ticket from ADS operation team
priority:major
Set the priority to major because the production is heavily impacted
Environment:
Current Behavior:
SCDF is frequently unavailable since 04/08/2023: half the time, it is impossible to deploy any stream, and impossible to access SCDF GUI.
Expected Behavior:
SCDF is available.
Steps To Reproduce:
Connect to the SCDF GUI.
Test execution artefacts (i.e. logs, screenshots…)
Here is the memory consumption graph:
Here is the error log:
Error_SCDF_heap_space.txt
Whenever possible, first analysis of the root cause
The pod spring-cloud-dataflow-skipper is frequently in CrashLoopBackOff (liveness/readiness KO).
The logs indicate that the application runs out of java heap space.
There is no explicit java heap space setting, and the default setting is unknown.
We explicitly configured the java heap space to 1024 MB and restarted the pod.
After that, the problem is no longer present.
The used memory has increased from ~850 MB to ~1350 MB.
Workaround
Increase the java heap space in the SCDF skipper deployment:
Add the 2 following lines in the "env" section:
Bug Generic Definition of Ready (DoR)
Bug Generic Definition of Done (DoD)
The text was updated successfully, but these errors were encountered: