Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-2465: Task inputs information lost when enabled RegExTopicGenerator or specified 'task.inputs' explicitly in non-legacy application #1286

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alnzng
Copy link
Contributor

@alnzng alnzng commented Feb 21, 2020

Symptom

If the user’s non-legacy application enabled RegExTopicGenerator or specified task.inputs explicitly and specified the input streams in its application descriptor, the expectation from the user side should be that the application can consume messages from specified input streams and Kafka topics that matched specified regex patterns.

However, in current logic seems the input information from the application descriptor will be overrided by the information from RegExTopicGenerator or task.inputs in the config file, which means the user’s application can only consume from matched Kafka topics or the inputs specified in task.inputs.

Cause

The generated task inputs from the application descriptor are overrided by JobNodeConfigurationGenerator.mergeConfig function.

Changes

Merge generated inputs and original inputs before doing JobNodeConfigurationGenerator.mergeConfig function call.

Tests

  • All unit tests and integration tests are passed

API Changes

None

Upgrade Instructions

None

Usage Instructions

Noe

When the user's non-legacy application enables 'RegExtopicGenerator' or
specifies 'task.inputs' in the config file, the inputs information are
specified in 'ApplicationDescriptor' will lost.

Signed-off-by: Alan Zhang <[email protected]>
@alnzng alnzng requested a review from rmatharu-zz February 21, 2020 04:27
@mynameborat
Copy link
Contributor

@bkonold can you take a look at this PR?

@bkonold
Copy link
Contributor

bkonold commented Jun 11, 2020

It is my understanding that we override generated configs to allow for job deployment to be reconfigured without building a new binary of the job. This is useful, for example, when managing issues in production and a job needs to be quickly reconfigured; the job's configuration can be modified and the job redeployed with the same binary vs needing to touch the job's app descriptor and building a new version of the binary.

For that reason it seems contradictory to merge values for the same key between original and generated config...

A bit more generally begs the question of how we treat precedence between original config, rewritten config, and generated configs (from app descriptor). @kw2542 Since you are working on the deployment flow, can you comment on what the touch points are in the system currently for rewriting configs? Do we have a clear picture of what this precedence is now?

@rmatharu-zz
Copy link
Contributor

rmatharu-zz commented Jun 11, 2020

".... input information from the application descriptor will be overrided by the information from RegExTopicGenerator or task.inputs...."
Isnt this desired behavior ?
If a user provides conflicting values for inputs in the two places, we resolve the conflict in favor of the one in the config.
I'm not sure why this is problematic.

@mynameborat
Copy link
Contributor

".... input information from the application descriptor will be overrided by the information from RegExTopicGenerator or task.inputs...."
Isnt this desired behavior ?
If a user provides conflicting values for inputs in the two places, we resolve the conflict in favor of the one in the config.
I'm not sure why this is problematic.

+1; This PR negates - #1065

@alnzng
Copy link
Contributor Author

alnzng commented Jun 11, 2020

Clarify the issue here:

If the user’s non-legacy application enabled RegExTopicGenerator and specified the input streams in its application descriptor, the expectation from the user side should be the application that can consume messages from specified input streams and Kafka topics that matched specified regex pattern. However, in current logic seems the input information from the application descriptor will be overdried by the information from RegExTopicGenerator, which means the user’s application can only consume from matched Kafka topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants