-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make encoder-marc21 more forgiving? #528
Comments
If I am not mistaken MF in general has a "make or break" approach to transforming things especially the |
I separated the broken records from the valid ones. e.g. 6500\x1e":
"7": ""
"0": "(DE-588)4057379-5"
"0": "(DE-101)040573796"
a: "Steroide"
"2": "gnd"
"650d\x1e":
"7": ""
"0": "(DE-588)4039983-7"
"0": "(DE-101)040399834"
a: "Molekularbiologie"
"2": "gnd"
"650d\x1e":
"7": ""
"0": "(DE-588)4067488-5"
"0": "(DE-101)040674886"
a: "Zeitschrift"
"2": "gnd"
"650d\x1e":
"7": ""
"0": "(DE-588)4057379-5"
"0": "(DE-101)040573796"
a: "Steroide"
"2": "gnd"
"650d\x1e":
"7": ""
"0": "(DE-588)4006777-4"
"0": "(DE-101)040067777"
a: "Biochemie"
"2": "gnd" See here in the playground You can spot the broken indicators in the yaml result. I also checked the broken records with
Longer report with The separators in these examples seem to be broken. Let me see how Catmandu is handling it. |
I also tested the broken records with catmandu it seems that their marc decoder AND not the encoder handles the incomming data differently. It does not skip the broken separators but the broken elements as a whole. Here it replaces the broken indicators with whitespaces: MF Result transforming MARC into MARCXML, have a look at the indicator and the first subelement : <marc:datafield tag="775" ind1="0" ind2="�">
<marc:subfield code="8"></marc:subfield>
<marc:subfield code="i">Online-Ausg.</marc:subfield>
<marc:subfield code="t">�The� journal of steroid biochemistry and molecular biology</marc:subfield>
<marc:subfield code="w">(DE-600)1482780-3</marc:subfield>
<marc:subfield code="w">(DE-101)019756801</marc:subfield>
</marc:datafield>
<marc:datafield tag="780" ind1="8" ind2="0">
<marc:subfield code="�">00</marc:subfield>
<marc:subfield code="i">Vorg.:</marc:subfield>
<marc:subfield code="t">�The� journal of steroid biochemistry</marc:subfield>
<marc:subfield code="w">(DE-600)80169-0</marc:subfield>
<marc:subfield code="w">(DE-101)010545514</marc:subfield>
</marc:datafield> CATMANDU Result transforming MARC into MARCXML with: <marc:datafield tag="775" ind1=" " ind2=" ">
<marc:subfield code="i">Online-Ausg.</marc:subfield>
<marc:subfield code="t">�The� journal of steroid biochemistry and molecular biology</marc:subfield>
<marc:subfield code="w">(DE-600)1482780-3</marc:subfield>
<marc:subfield code="w">(DE-101)019756</marc:subfield>
</marc:datafield>
<marc:datafield tag="780" ind1=" " ind2=" ">
<marc:subfield code="i">Vorg.:</marc:subfield>
<marc:subfield code="t">�The� journal of steroid biochemistry</marc:subfield>
<marc:subfield code="w">(DE-600)80169-0</marc:subfield>
<marc:subfield code="w">(DE-101)0105</marc:subfield>
</marc:datafield> I would be in favour of adjust the behaviour of the decoder as an option that it does not create broken values from an broken separator. |
I try to follow. But the playground example in #528 (comment) results in "Request-URI Too Long". |
Thanks for the hint. MF Playground does not complain anymore if the URL is too long. Should open a ticket there. I fixed the example and added some more info to my comments: #528 (comment) |
As I revised my comments: @dr0i in short: we should not change the behaviour of encode-marc21 but of
|
Came up in #527 :
If we parse (assumingly) crude binary
MARC
the encoding fails.(first broken
MRC
seems to be02589nas a2200601 c 4500
in https://raw.githubusercontent.com/gbv/Catmandu-Tutorial/master/data/marc.mrc (should be double checked with another MARC-validator other than MF:Because MARCs binary
directory
of field787
points toIso646Constants.INFORMATION_SEPARATOR_2 = 0x1e
the encoding breaks))If an encoding breaks not only the field is dumped or the whole record but the whole stream. The dumping of the record and - more important- the whole stream can be avoided by piping
decode-marc21
tocatch-stream-exception
before piping toencode-marc21
.a) if the record is indeed invalid:
aa) shall we make the
encode-marc21
make more forgiving?ab) or is it enough to bail out (as it is atm) resp. to expect the user to use
catch-stream-exception
resp. fix the invalid MARC ?b) if the record is valid: fix
encode-marc21
The text was updated successfully, but these errors were encountered: