-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UI Feature] Propagate dynamic workflow error messages from flytepropeller to UI #4466
Comments
I suspect this is related to #4483. These use a flyteidl error that is wrapped to support the |
@fg91 do you have a minimal repro? Think I found the issues and fixed similarly for branch node error reporting, but want to double check here. |
Sorry for the late reply @hamersaw . This is a minimal repro: from flytekit import task, dynamic, workflow
@task
def foo(a = 1): # Notice the missing type hint
print(a)
@dynamic
def sub_wf():
foo(a=1)
@workflow
def wf():
sub_wf() The pod succeeds but the dynamic task fails without any error shown in the UI: Propeller logs show: E1219 15:53:40.603043 1 workers.go:103] error syncing '.../f96f28b7a6aa64183869': 0: [User] malformed dynamic workflow, caused by: Collected Errors: 1
Error 0: Code: MismatchingTypes, Node Id: dn0, Description: Variable [a] (type [simple:INTEGER ]) doesn't match expected type [simple:NONE ].
1: 0: [User] malformed dynamic workflow, caused by: Collected Errors: 1
Error 0: Code: MismatchingTypes, Node Id: dn0, Description: Variable [a] (type [simple:INTEGER ]) doesn't match expected type [simple:NONE ]. |
@fg91 thanks, this helps a lot. So I'm seeing this is a UI issue. Everything is stored correctly in FlyteAdmin (error included) but just not displayed in the UI. Running your example here, and then querying admin directly with
So propeller is correctly reporting, just should be displayed. Will re-direct accordingly. |
Thanks for testing @hamersaw 🙏 I understand that frontend is probably pretty busy already but it would help the UX of dynamic workflows a lot if they could implement this. (Unfortunately I have no clue about frontend things otherwise I'd be happy to tackle it myself 🙈) Our ML engineers make heavy use of dynamic workflows and we in the ops team get a question like "Why is my Flyte workflow in a weird succeeded yet failed state again?" every few days. |
01/18/2024 Contributor's meetup notes: Members agreed the UI should surface detailed error messages from propeller to aid users. John will investigate propagating errors appropriately in the UI. |
🤖 Hey @fg91, great to see you again! Trust you've been coding your way to glory since our last interaction. Let's dive into this new challenge you've got for us. I understand the issue you're facing and I agree that improving the FlyteConsole to display dynamic workflow related errors would greatly enhance the user experience. Here's a high-level approach to achieve this:
Please note that this is a high-level approach and the actual implementation may vary based on the existing architecture and design of the flytepropeller and FlyteConsole services. Regarding the error handling mechanism in the flytepropeller service, it primarily uses the I hope this helps! If you have any further questions or need more clarification, feel free to ask. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
It would've been very helpful for me too to have this exposed; I just spent an hour trying to understand how my dynamic both failed and succeeded at the same time. |
@davidmirror-ops please excuse bringing this up again but is there any chance the frontend team could tackle this in one of the upcomming sprints? These types of errors in dynamic workflows are almost impossible to debug for users without the help of a platform engineer because you need to know that you have to search for the execution id in the propeller logs and our users struggle with this again and again 🙈 |
+1 I'm staring at a similarly baffling error msg, and I'm so far trying and failing to guess my way through my team's code to see which of our many
|
+1 I recently hit this issue as well. The issue was due to a type mismatch:
FlytePropeller logs did show the issue, but it didn't propagate to the UI. This change would make debugging dynamic workflow issues a lot easier. |
Update: we had a call today to refresh on this issue. Here is a quick summary:
|
Motivation: Why do you think this is important?
Our platform users regularly ping us with executions where a dynamic workflow failed to correctly register.
Flyte's UI doesn't provide helpful information to understand that the registration of the dynamic sub workflow has failed and how to fix it.
In this example, the dynamic task pod itself succeeded but the node failed:
We in the MLOps team then typically look in the flytepropeller logs to try to understand how the dynamic subworkflow needs to be fixed.
In this case, the propeller logs showed:
Goal: What should the final outcome look like, ideally?
It would be very helpful if FlyteConsole would show the dynamic workflow related errors occurring in flytepropeller so that users can fix them themselves.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: