Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd_page_generateds validator messages should contain pageId #486

Open
bertsky opened this issue May 18, 2020 · 1 comment
Open

ocrd_page_generateds validator messages should contain pageId #486

bertsky opened this issue May 18, 2020 · 1 comment
Assignees

Comments

@bertsky
Copy link
Collaborator

bertsky commented May 18, 2020

The most recent generateDS PAGE-XML model now contains validation of type restrictions, which is laudable. But these messages are aggregated in a way which makes diagnosing just where the error came from impossible.

Example:

--------------------------------------------------
----- Warnings -- count: 1 -----
Warning: Value "217,2873 212,2878 190,2875 186,2878 170,2879 135,2879 130,2875 104,2878 83,2874 -2,2873 -5,2876 -5,2906 6,2910 74,2913 388,2915 426,2917 432,2921 488,2914 580,2917 587,2922 596,2922 606,2917 723,2914 849,2918 855,2923 870,2923 876,2918 910,2916 975,2918 987,2924 996,2924 1004,2920 1065,2917 1207,2917 1264,2920 1271,2925 1284,2925 1291,2920 1335,2918 1571,2920 1584,2913 1584,2883 1581,2880 1514,2881 1478,2885 1434,2884 1425,2880 1398,2883 1394,2879 1383,2879 1379,2883 1335,2880 1331,2884 1314,2885 1232,2884 1227,2879 1207,2878 1203,2882 1192,2882 1189,2879 1172,2881 1169,2878 1140,2878 1136,2882 1123,2883 1091,2882 1083,2878 1041,2882 1035,2879 1013,2878 1009,2882 877,2883 832,2882 824,2877 805,2877 801,2881 785,2881 779,2877 748,2879 734,2879 731,2876 709,2879 706,2876 694,2876 688,2880 664,2881 635,2881 632,2878 597,2881 589,2880 585,2876 574,2876 570,2880 540,2876 488,2875 483,2878 441,2880 392,2880 389,2877 368,2876 364,2880 306,2881 241,2878 237,2873" does not match xsd pattern restrictions: [['^(([0-9]+,[0-9]+ )+([0-9]+,[0-9]+))$']]

Note that this does not only happen during validation, but normal processing too.

IMO what needs to be done is provide a custom GdsCollector class which controls how and when messages are going to be printed. Our own implementation can then take care of:

  • showing the page_id, file ID and fileGrp (if necessary)
  • using the proper logging mechanism
  • maybe stopping processing (by raising an exception)
@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2020

#576 addresses this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants