Update collation generator #356

sven-oly · 2024-12-11T20:47:40Z

Also update characterizations for collation.

… update_collation_generator

…nerator

verifier/testreport.py

echeran · 2024-12-12T22:31:03Z

verifier/testreport.py

+                        if value not in results[key]:
+                            results[key][value] = set()
+                        results[key][value].add(label)


This can be simplified by only keeping the last line. This works because of the previous change you made to initialize results using defaultdict like so: results = defaultdict(lambda : defaultdict(list)). See code above for an example, see here for an explanation.

Suggested change

if value not in results[key]:

results[key][value] = set()

results[key][value].add(label)

results[key][value].add(label)

echeran · 2024-12-12T22:32:37Z

verifier/testreport.py

+                        if key == 'input_list':
+                            if 'input_size' not in results:
+                                results['input_size'] = {}
+                            else:
+                                results['input_size'].add(len(value))
+                        if key == 'rules':
+                            value = 'RULE'  # A special case to avoid over-characterization
+                        if key not in results:
+                            results[key] = {}
+                        try:
+                            if not results[key].get(value, None):
+                                results[key][value] = set()
+                            results[key][value].add(label)


See if you can assure that wherever the initialization of input_data occurs uses defaultdict, and then simplify the code here.

Trying this now.

echeran · 2024-12-12T22:34:40Z

testgen/generators/collation_short.py

@@ -142,7 +142,7 @@ def check_parse_compare(self, line_index, lines):
                raw_string2 = is_comparison_match.group(3)
                string2 = ''
                try:
-                    string2 = raw_string2.encode().decode("unicode_escape")
+                    string2 = raw_string2  # Don't do any unescaping


Hmm, if we're not unescaping the escape sequences, then how are we comparing the intended characters in the string?...

I will look at handling these in the test driver, making sure that we do indeed send the right characters to the executors.

echeran

please fix

Co-authored-by: Elango Cheran <[email protected]>

echeran

LGTM, filing #358 to capture the unaddressed comments for later follow up

sven-oly added 10 commits December 4, 2024 13:36

Better parsing of collationtest.txt data

e37abd5

Fixing how collation rules are parsed. Removes may errors

c8e26ce

Minor fixes

a5b984c

Minor fixes

b2f7ed2

Merge remote-tracking branch 'origin/update_collation_generator' into…

fea33c7

… update_collation_generator

String spaces in rules

c163569

Fix logic of resetting rule set

296ccf0

fix utf-8

8033705

Merge remote-tracking branch 'upstream/main' into update_collation_ge…

c1b18e3

…nerator

Remove encode/decode from test generation. Add characterization options.

2457763

sven-oly assigned echeran Dec 11, 2024

echeran reviewed Dec 12, 2024

View reviewed changes

echeran requested changes Dec 12, 2024

View reviewed changes

Update verifier/testreport.py

9b33dfd

Co-authored-by: Elango Cheran <[email protected]>

echeran mentioned this pull request Dec 14, 2024

Simplify verifier code using defaultdict behavior #358

Open

echeran approved these changes Dec 14, 2024

View reviewed changes

sven-oly merged commit aeba024 into unicode-org:main Dec 14, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update collation generator #356

Update collation generator #356

sven-oly commented Dec 11, 2024

echeran Dec 12, 2024

sven-oly Dec 12, 2024

echeran Dec 12, 2024

sven-oly Dec 12, 2024

echeran Dec 12, 2024

sven-oly Dec 12, 2024

echeran left a comment

echeran left a comment

Update collation generator #356

Update collation generator #356

Conversation

sven-oly commented Dec 11, 2024

echeran Dec 12, 2024

Choose a reason for hiding this comment

sven-oly Dec 12, 2024

Choose a reason for hiding this comment

echeran Dec 12, 2024

Choose a reason for hiding this comment

sven-oly Dec 12, 2024

Choose a reason for hiding this comment

echeran Dec 12, 2024

Choose a reason for hiding this comment

sven-oly Dec 12, 2024

Choose a reason for hiding this comment

echeran left a comment

Choose a reason for hiding this comment

echeran left a comment

Choose a reason for hiding this comment