-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for collating sequences on indexed files #132
Add support for collating sequences on indexed files #132
Conversation
Hmm, that broke the NIST testsuite. See 12.3.5.3 in ISO/IEC 1989:2014 :
|
Found the answer in 12.4.4.3:
So this is by default "native", not "program collating". BUT: before adjusting the code, please add a minimal version of the failing NIST test (ideally written from scratch) to our internal testsuite; this should fail before your change-adjustment and pass afterwards. (and note that this is also one of the fixed file attributes, which in theory should be checked - we don't do up to 4.x, but don't check that there either...). |
ccf423d
to
f840670
Compare
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## gcos4gnucobol-3.x #132 +/- ##
=====================================================
+ Coverage 65.78% 65.86% +0.08%
=====================================================
Files 32 32
Lines 59416 59481 +65
Branches 15694 15708 +14
=====================================================
+ Hits 39087 39178 +91
+ Misses 14311 14283 -28
- Partials 6018 6020 +2 ☔ View full report in Codecov by Sentry. |
I made the necessary changes. I also added a new As for the tests, I reworked them and included a dummy program collating sequence (taken from the failed NIST test), which was sufficient to cause the testsuite to fail priori to fixing the bug. |
That sounds all good! Please check the test coverage of your changes, the codecov bot says:
188 sounds a bit much. |
f840670
to
2961e35
Compare
Fixed ! Just had to resynchronize with upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some general changes (should not be too much work) and one change to cob_key
that actually reduces the amount of changes (here the important part again):
Note: If you always place the key collation into each
cob_key
, then you don't need to have the file-collation incob_file
at all (if set then those will never be empty).**
I'd like to also have support for key collation - as the runtime already handles that (just specifying always the same collation), which is possibly a bit more work, if you feel that's too much, then this part could be postponed.
d6f4c29
to
623e27a
Compare
Should be good. Though I left the key collating sequence for later (although I enabled the field to store the file collating sequence). |
623e27a
to
7d4daef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we're nearly done; I think this can be checked in next week (and maybe there is a key collation available until then, too)
Note: I do have a bigger refactoring of the BDB code which I want to push start of next week, should I wait for this PR to be finished and merged before or would you possibly even prefer if I push the refactoring before? |
I plan to make the final changes to this PR on monday (need to rest this week end) ; if that prevents you from merging, I don't mind rebasing my PR on your changes. |
560cd05
to
8a6e7f0
Compare
Made the last changes, though there might be another one to do.
All other |
I have a strange feeling now... Can you please verify that:
* the testsuite has at least one entry with variable length keys (which test case is that?) for each SEARCH and INDEXED
* if missing please add it
* check if we do store the variable length in the key fields
I agree that a memcpy should only be used for testing equality or difference, otherwise either a numeric compare (if key field is numeric - the byte order could be non-native!) or a collation aware check needs to be used
|
8a6e7f0
to
2969913
Compare
Doesn't seem to be checked - but shouldn't I address this in a different PR ?
Oh, wait... I believe non-display numeric keys in indexed files are already incorrectly ordered... Just tested with a primary key of usage |
That was my feeling. Please address the numeric key sorting, if possible.
I _think_ this should work (and hopefully be tested) in SEARCH already.
As both can use the same comparison logic (just using a different set of collations), I think it's useful to test that.
Testing variable length keys belong to this PR if previously possibly completely handled by BDB and now in our own search function.
Note: those have to be ordered as in COBOL, which may be different from BDB.
|
I see two ways to fix that :
In any case, I believe this require some thinking before actually doing that, and I'm not sure this belongs to this PR (as it did not introduce that problem).
Tested, seems to work fine for SEARCH.
Could you elaborate ?
I'm not sure to understand ; could you rephrase that ? |
that would be enough for now and is likely "the most portable" solution. But that creates a HUGE problem: if we do that, then the files need to be rebuilt :-/ Note that using the key type and use the appropriate COBOL sort function (if the field structure within ´->app_data Otherwise we could have small-enough numbers be converted to big-endian binary and bigger ones to be converted to USAGE DISPLAY SIGN SEPARATE LEADING in As LMDB does not provide an external compare function, the second solution seems to be the only one that could be applied there (but note that LMDB exists only in trunk and even then is marked as experimental and possibly will be dropped). Note: with ISAM there are key types that handle this (including IEE754), but we currently only use
Hm. Please add a test with expected failure and a comment about the underlying problem.
BDB compares keys. those are setup from So... do you add key collations in cobc now and skip to set the collation if the field is numeric (potentially also skip if it is NATIONAL)? That should provide enough for the original issue.
The new indexed_compare is our specialized function to our case - and we cannot even support two different sizes as we don't have it when the specialized function is used. For the runtime warning - I think that should be an error, but we don't expect it to be seen by a user so remove the translation part and add a coverage exclusion, like /* LCOV_EXCL_START */
if (k++ == MAX_MODULE_ITERS) { /* prevent endless loop in case of broken list */
/* not translated as highly unexpected */
cob_runtime_warning ("max module iterations exceeded, possible broken chain");
break;
}
/* LCOV_EXCL_STOP */ |
So, both solutions would imply a rebuilding of the files. What I like about the first one is that it uses backend's mechanisms intended for this purpose (i.e. custom key compare function) - but it has to be implemented specifically for each backend, and not all these backends expose such features. And what I like about the second one is that it is (I think ?) backend-agnostic.
I don't think we would ever have to translate the "encoded" keys backward, right ?
Isn't
Even if slightly off-topic : I really like the idea of an LMDB backend. What would be the reasons for dropping it ? Not performing as expected ?
I guess I'll do that.
Well, I think this is handled in
We agree that NUMERIC DISPLAY should be stored lexicographically, right ?
Got it. As an alternative to an embedded "marker", how about an extra file for such metadata ?
I think that's what my code does now ; except that it uses the file collation for every key - but I guess with all the things to do/fix we discussed here, I'd better implement complete support for (alphanumeric) key collations right now so we don't have to come back to it later.
Alright, I'll change that.
Will do. |
fab88f5
to
6939a31
Compare
Rebsed, added key collations and a test with expected failure regarding the numeric keys sorting problem. |
while reviewing: I've just checked in the refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from the minor review comments and the rebase: that's beautiful - please: go, go, go
6939a31
to
ba00023
Compare
All changes accounted for - will merge to SVN if it's okay (I see the tag is already there). |
ba00023
to
b594e0e
Compare
I trust your changes will be fine - as the test is already set: go on with the commit.
Thanks!
|
Merged on SVN. |
@ddeclerck Thank you for working on this. |
Ah, I overlooked that - sorry. Hopefully this was an easy fix. Thanks. |
This PR adds support for COLLATING SEQUENCE on indexed files, as mentionned in https://sourceforge.net/p/gnucobol/feature-requests/459/.
Note this has. only been done for the BDB backend - that's what our client needs.