Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update collation generator #356

Merged
merged 11 commits into from
Dec 14, 2024

Conversation

sven-oly
Copy link
Collaborator

Also update characterizations for collation.

Comment on lines +665 to +667
if value not in results[key]:
results[key][value] = set()
results[key][value].add(label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified by only keeping the last line. This works because of the previous change you made to initialize results using defaultdict like so: results = defaultdict(lambda : defaultdict(list)). See code above for an example, see here for an explanation.

Suggested change
if value not in results[key]:
results[key][value] = set()
results[key][value].add(label)
results[key][value].add(label)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +764 to +776
if key == 'input_list':
if 'input_size' not in results:
results['input_size'] = {}
else:
results['input_size'].add(len(value))
if key == 'rules':
value = 'RULE' # A special case to avoid over-characterization
if key not in results:
results[key] = {}
try:
if not results[key].get(value, None):
results[key][value] = set()
results[key][value].add(label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you can assure that wherever the initialization of input_data occurs uses defaultdict, and then simplify the code here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this now.

@@ -142,7 +142,7 @@ def check_parse_compare(self, line_index, lines):
raw_string2 = is_comparison_match.group(3)
string2 = ''
try:
string2 = raw_string2.encode().decode("unicode_escape")
string2 = raw_string2 # Don't do any unescaping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we're not unescaping the escape sequences, then how are we comparing the intended characters in the string?...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at handling these in the test driver, making sure that we do indeed send the right characters to the executors.

Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix

Co-authored-by: Elango Cheran <[email protected]>
Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, filing #358 to capture the unaddressed comments for later follow up

@sven-oly sven-oly merged commit aeba024 into unicode-org:main Dec 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants