-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT]Handling error response when a comma is present in the operand name of tags #107
base: main
Are you sure you want to change the base?
Conversation
I like the idea of having cucumber notifying the user of misconfiguration issues. However, the proposed exception message (Tag expression "%s" could not be parsed because of syntax error: Tag names should follow this pattern "%s"") is a bit rough. As the natural understanding of ”@tag1,@tag2" is a an "or/and" combination of both tags, the exception message could be friendlier and suggest to use one of the operators (e.g. something like "It looks like your are trying to combine tags using comma (','), which is not supported as tag combination operators. Please replace it by either 'or' or 'and' operators, e.g. '@tag1 or @tag2'."). The PR missed unit tests. |
@@ -106,6 +108,7 @@ private static List<String> tokenize(String expr) { | |||
isEscaped = true; | |||
} else if (c == '(' || c == ')' || Character.isWhitespace(c)) { | |||
if (token.length() > 0) { | |||
isOperandValid(token,expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the method isOperandValid()
is actually wrong here.
Correct, that is your intention but also operator-strings like and
, but
and not
are checked by this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @jenisys , for your review and feedback. I have made the necessary adjustment by updating the method name. As you rightly mentioned, apart from operands, this method will also handle validations for operators. Please confirm if the changes meet the requirements. Your continued assistance is appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method-name is now OK for me 👍
Thank you, @jkronegg, for your valuable comment and thorough review. I have enhanced the error message with additional details to provide clearer guidance. Additionally, I have included relevant unit tests to ensure proper validation. Please review the changes and kindly confirm if they meet the requirements. Your continued support is greatly appreciated. |
HINTS:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR prevents the use of commas inside tag expressions altogether.
However comma's are valid characters inside a tag. Consequently they must be a valid character to use in a tag expression. For example a scenario may be tagged with @a1,b2
. Which could then be selected by the tag expression @a1,b2
@a1,b2
Scenario: Example 1
Likewise a scenario might be tagged with @a1,
and @b2
. This should be selectable through the tag expression @a1, and @b2
.
@a1,@b2
Scenario: Example 2
@a1, @b2
Scenario: Example 3
The intent of this PR is to inform users they are using a tag-expression syntax from v1 while using v2. Currently I do not believe there way to accept all valid v2+ tag expressions while rejecting all v1 expressions. There may however the possibility of validating literals.
A tag in Gherkin is delimited by @
and Gherkin does not allow the @
to be escaped. So every tag matches the pattern ^@[^@]+
. So any tag expression literal that contains a @
after the starting @
can not possibly target a gherkin tag.
For example
@a1@b2
Scenario: Example 4
This scenario is tagged with @a1
and @b2
rather than @a1@b2
. So the tag expression @a1@b2
will never find any valid scenarios and could be considered invalid.
This will incidentally also capture any use of the v1 tag expressions of the form @a1,@b2
and also ~@a1
.
Proposal:
- Validate that Gherkin tags do indeed follow the
^@[^@]+
pattern - Validate literals in tag expressions to match the
^@[^@]+
pattern. - (Optional) Provide the user with examples of correct tag expressions in the error message. Note this should be done for all parse errors, not just the one's we are adding now.
@@ -11,7 +11,7 @@ | |||
|
|||
public final class TagExpressionParser { | |||
//regex for token to ensure no token has ',' in them later can be customized further | |||
private final static String VALID_TOKEN = "(?:@[^@]*|and|or|not|/(|/))$"; | |||
private final static String VALID_TOKEN = "^(?:@[^@]*|and|or|not|\\(|\\))$"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By trying to validate all tokens rather than the litterals you are making this more complex than needed. Considering that this implementation will have to be repeated for a few languages, perhaps a review in the middle of just the Java implementation can save some duplication of effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @mpkorstanje, for your reply
I assume that a tag expression consists of operands and operators. When splitting the expression, there are two possibilities. The current token or string could be an operator, which includes "(", ")", "and", "or", and "not". Alternatively, it could be an operand, which should adhere to the Gherkin standard of starting with "@" and not containing "@" anywhere else.
To handle this, I have used a regex pattern to ensure that only valid tokens, which form the expression, are evaluated. By considering both operands and operators, it simplifies the code and eliminates the need for additional checks to determine the token type.
I believed this approach reduces complexity and improves the readability of the code, but maybe I might be missing someting, I would appreciate your opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually the situation is slightly different (as you can see from the tests / testdata):
- Tags in a tag-expression may either be
@foo
(with-leading-atsign) orfoo
(without-leading-atsign) @
(atsign) should not occur in a tag (after the first char) in a tag-expression- Gherkin parser removes the
@
(atsign) from tags for Features/Rules/Scenarios/… . Therefore,@foo
becomes“foo“
internally
SEE ALSO:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gherkin parser removes the @ (atsign)
@jenisys are you sure about that? Or is that true only for some Gherkin parsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe I might be missing someting, I would appreciate your opinion on this.
@shivam-sehgal without repeating myself that will be a bit difficult.
But do consider validating only the litterals. In essence, do not put the validation in the middle of tokenizer but rather after tokenization, just prior to litteral creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpkorstanje cucumber-jvm basically calls this library to parse the tag expression as well as to validate if the tag expression is valid or not so the flow is like this in cucmberOptions annotation user provides the tags expression shown in the below code
# — FILE: SomeClass.java
@CucumberOptions(features = "src/it/resources/features",
glue = {"com.adobe.aep.devex.exim.integrationtests.stepDefinitions",
"com.adobe.aep.devex.exim.integrationtests.config",
"com.adobe.aep.devex.exim.integrationtests.hooks"
},
plugin = {"pretty", "html:target/cucumber-html-report.html",
"json:target/cucumber.json",
"rerun:target/cucumber-api-rerun.txt"
},
tags = "not @ignore and @hello" // Exclude the feature with the "@ignore" tag
)
…
this annotation is parsed by CucumberOptionsAnnotationParser.class.
addTags method in this class is called which then calls this repo to validate the tag expression, now the intent of this PR was users by mistake put tag expressions like @tag1,@tag2 in the tags section in the CucumberOptions, and they don't get any error message, to avoid that we need to validate the tag expression not having any tags or literals like @tag,@againInTag
to prevent this confusion and give users a valid error response so they can understand that they need to use logical operator, instead of commas, I can put this tag validation for the provided tag expression in addTags method of the CucumberOptionsAnnotationParser clas i.e in cucumber jvm, but my mind says it's better to keep validation logic inside this tag expression validation repo only to follow the single responsibility principle of oops,
to address this concern with the least impacting change, for now, I see if there is an expression having a tag starting with @
that tag shouldn't have @
after first, it will prevent this confusion with a proper error message letting user understand that they are doing mistake and cucumber is reading their expression as a single tag, and also won't affect any case where we allow the literals that are not starting with @
In short, cucumber-jvm calls this repo to validate the tag expression passed by the user and uses a message from TagExpressionException
to let the user know with a message what is wrong with the expression, the message should be coming from here i.e from this repo's java directory, else the responsibility of this code we have to handle separately in cucumber-jvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution approach could be:
- if
CucumberOptions
had an optional parameter, liketag_expression_validator
,tag_expression_parser
or antag-expression preprocess hook
you could perform a pre-process check - Pre-process check would just split the
tag-expression
string value into words and check ifcomma
(s) are contained in any word - If a
comma
is contained, you issue a log-warning or raise an exception (whatever the „right“ consequent action is), …
This approach would have the following advantages:
- Strict checking is optional
- If you want/need strict checking of
tag-expressions
, you can plug-in the functionality
HINT:
By looking at the current implementation of the CucumberOptions
class, this needed parameter
would need to be added to support such a kind of solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenisys, I find the approach of using an optional flag for literal validation quite appealing. However, I have some doubts about the term "strict checking." If libraries using this repository are enforcing Gherkins rules, then it seems reasonable for this library to do the same. However, if we believe and would like to continue supporting users who define literals not starting with @
in new versions, then the optional parameter approach makes sense.
One potential concern raised by @mpkorstanje is that checking for commas alone may cause issues. To differentiate between v1 and v2, perhaps we should consider checking for ^@[^@]+
which he mentioned if the optional strict validation flag is passed in CucumberOptions
.
I would appreciate hearing both of your opinions on this matter. Let's collaborate and I can begin working on it.
cc: @mpkorstanje
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cost of the complexity is starting to exceed the value of the solution.
Perhaps a simpler solution such as improving the documentation in Cucumber JVM might help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mpkorstanje for your advice, have opened a pull request to update the doc a little bit, to add info for users to avoid the confusion of separating tags using ,
, attaching the screenshot of the doc changes below
would like to request your review of this PR .
Thanks in advance
🤔 What's changed?
to handle tags cases like @Step1,@step2
adding regex to handle cases if the operand name of tags contains commas, in future this regex can be further extended
to handle any new case that will arise
⚡️ What's your motivation?
heavily use this library, and wanted to contribute.
🏷️ What kind of change is this?
I came across issue. Although the documentation specifies the expected format for tags, it seems that the pattern used by the user, '@Step1,@step2,' did not generate any error. This discrepancy has the potential to confuse users. To address this, it would be beneficial to implement a validation mechanism that throws an appropriate error when such patterns are encountered.
♻️ Anything particular you want feedback on?
📋 Checklist:
added support for regex filtering on operand names of tags
Limitations
1)Currently only handling case ',' in operand name
2)Currently just modified the java module of this lib
This text was originally generated from a template, then edited by hand. You can modify the template here.