Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF-8 BOM to files written to S3/Minio #567

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions src/main/java/org/folio/dew/repository/BaseFilesStorage.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
Expand Down Expand Up @@ -177,16 +178,23 @@ public String upload(String path, String filename) throws IOException {
* @throws IOException - if an I/O error occurs
*/
public String write(String path, byte[] bytes, Map<String, String> headers) throws IOException {
byte[] bom = {(byte)0xEF, (byte)0xBB, (byte)0xBF};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to make this a constant instead of initializing the array in the method each time?
private static final byte[] UTF8_BOM = new byte[]{(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};


ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
outputStream.write(bom);
outputStream.write(bytes);
byte[] bytesWithBom = outputStream.toByteArray();

path = getS3Path(path);
if (isComposeWithAwsSdk) {
log.info("Writing with using AWS SDK client");
s3Client.putObject(PutObjectRequest.builder().bucket(bucket)
.key(path).build(),
RequestBody.fromBytes(bytes));
RequestBody.fromBytes(bytesWithBom));
return path;
} else {
log.info("Writing with using Minio client");
try(var is = new ByteArrayInputStream(bytes)) {
try (var is = new ByteArrayInputStream(bytesWithBom)) {
return client.putObject(PutObjectArgs.builder()
.bucket(bucket)
.region(region)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Last updated,Original heading,New heading,Identifier,Original 1XX,New 1XX,Authority source file name,Number of bibliographic records linked,Updater
Last updated,Original heading,New heading,Identifier,Original 1XX,New 1XX,Authority source file name,Number of bibliographic records linked,Updater
2023-10-01 12:00:00.000Z,"Charles, Prince of Wales","Charles III, King",n1234567,150,110,LC Name Authority file (LCNAF),105,"Admin, Diku"
2023-10-01 12:00:00.000Z,"Charles, Prince of Wales","Charles III, King",n1234567,150,110,LC Name Authority file (LCNAF),105,"Admin, Diku"
2023-08-01 12:00:00.000Z,"Charles III, King",King Charles III,mo34056,100,100,Not specified,10,"Test, Folio"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Last updated,Original heading,New heading,Identifier,Original 1XX,New 1XX,Authority source file name,Number of bibliographic records linked,Updater
Last updated,Original heading,New heading,Identifier,Original 1XX,New 1XX,Authority source file name,Number of bibliographic records linked,Updater
No records found
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Failed,Bibliographic title,Bibliographic UUID,Failed bib field update,Linked authority identifier,Reason for error
Failed,Bibliographic title,Bibliographic UUID,Failed bib field update,Linked authority identifier,Reason for error
2019-08-24 14:15:22.000Z,First title,64d2028c-ae87-4069-a624-66089d957ef9,650,mo34056,Not found 404
2019-08-24 14:15:22.000Z,Second title,65d2028c-ae87-4069-a624-66089d957ef9,150,mo34057,Invalid value
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
badHoldingsTypeId,Holdings type not found by id=1d51c4da-0d5f-4303-a922-3e9d7a9b22f3
badHoldingsTypeId,Holdings type not found by id=1d51c4da-0d5f-4303-a922-3e9d7a9b22f3
badHoldingsCallNumberTypeId,Call number type not found by id=1d51c4da-0d5f-4303-a922-3e9d7a9b22f3
badHoldingsNoteTypeId,Note type not found by id=1d51c4da-0d5f-4303-a922-3e9d7a9b22f3
badIllPolicyId,Ill policy not found by id=1d51c4da-0d5f-4303-a922-3e9d7a9b22f3
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
0b1e3760-f689-493e-a98e-9cc9dadb7e83,title,,ho13,FOLIO,d1670310-ceac-47d9-aaba-aaeeb890bc07,Physical,"Book, print (books)",Test administrative note,Annex,Main Library,shelving title,copy number,LC Modified,prefix,call number,suffix,10,statement;statement public note;statement staff note,statement for supplements;statement for supplements public note;statement for supplements staff note,statement for indexes;statement for indexes public note;statement for indexes staff note,Limited lending policy,digitisation policy,retention policy,Note;a note;false;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56|Note;a note;true;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56,"URL relationship;URI;Link text;Materials specified;URL public note
Version of resource;www.someurl.com;link text;material;www.someurl.com",aquisition method,order format,receipt status,
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
855d4693-4087-4339-8c14-c25c350e5e59,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,,,,,,
ae4288c6-4999-492a-82f4-eceb3a6e1ec6,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,,,,,,
59b36165-fcf2-49d2-bf7f-25fedbc07e44,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,;a note;false;diku;null,,,,,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
789,No match found
789,No match found
123,Duplicate entry
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
456,Instance not found by hrid=456
456,Instance not found by hrid=456
789,Instance not found by hrid=789
123,Duplicate entry
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
789,Item not found by barcode=789
789,Item not found by barcode=789
123,Duplicate entry
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
0b1e3760-f689-493e-a98e-9cc9dadb7e83,title,,ho13,FOLIO,d1670310-ceac-47d9-aaba-aaeeb890bc07,Physical,"Book, print (books)",Test administrative note,Annex,Main Library,shelving title,copy number,LC Modified,prefix,call number,suffix,10,statement;statement public note;statement staff note,statement for supplements;statement for supplements public note;statement for supplements staff note,statement for indexes;statement for indexes public note;statement for indexes staff note,Limited lending policy,digitisation policy,retention policy,Note;a note;false;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56|Note;a note;true;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56,"URL relationship;URI;Link text;Materials specified;URL public note
Version of resource;www.someurl.com;link text;material;www.someurl.com",aquisition method,order format,receipt status,
855d4693-4087-4339-8c14-c25c350e5e59,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,,,,,,
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
0b1e3760-f689-493e-a98e-9cc9dadb7e83,title,,ho13,FOLIO,d1670310-ceac-47d9-aaba-aaeeb890bc07,Physical,"Book, print (books)",Test administrative note,Annex,Main Library,shelving title,copy number,LC Modified,prefix,call number,suffix,10,statement;statement public note;statement staff note,statement for supplements;statement for supplements public note;statement for supplements staff note,statement for indexes;statement for indexes public note;statement for indexes staff note,Limited lending policy,digitisation policy,retention policy,Note;a note;false;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56|Note;a note;true;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56,"URL relationship;URI;Link text;Materials specified;URL public note
Version of resource;www.someurl.com;link text;material;www.someurl.com",aquisition method,order format,receipt status,
855d4693-4087-4339-8c14-c25c350e5e59,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,,,,,,
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
0b1e3760-f689-493e-a98e-9cc9dadb7e83,title,,ho13,FOLIO,d1670310-ceac-47d9-aaba-aaeeb890bc07,Physical,"Book, print (books)",Test administrative note,Annex,Main Library,shelving title,copy number,LC Modified,prefix,call number,suffix,10,statement;statement public note;statement staff note,statement for supplements;statement for supplements public note;statement for supplements staff note,statement for indexes;statement for indexes public note;statement for indexes staff note,Limited lending policy,digitisation policy,retention policy,Note;a note;false;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56|Note;a note;true;diku;b160f13a-ddba-4053-b9c4-60ec5ea45d56,"URL relationship;URI;Link text;Materials specified;URL public note
Version of resource;www.someurl.com;link text;material;www.someurl.com",aquisition method,order format,receipt status,
855d4693-4087-4339-8c14-c25c350e5e59,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,,,,,,
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings record id,Version,HRID,Holdings type,Former ids,"Instance (Title, Publisher, Publication date)",Permanent location,Temporary location,Effective location,Electronic access,Call number type,Call number prefix,Call number,Call number suffix,Shelving title,Acquisition format,Acquisition method,Receipt status,Administrative note,Notes,Ill policy,Retention policy,Digitization policy,Holdings statements,Holdings statements for indexes,Holdings statements for supplements,Copy number,Number of items,Receiving history,Discovery suppress,Statistical codes,Tags,Source
Holdings record id,Version,HRID,Holdings type,Former ids,"Instance (Title, Publisher, Publication date)",Permanent location,Temporary location,Effective location,Electronic access,Call number type,Call number prefix,Call number,Call number suffix,Shelving title,Acquisition format,Acquisition method,Receipt status,Administrative note,Notes,Ill policy,Retention policy,Digitization policy,Holdings statements,Holdings statements for indexes,Holdings statements for supplements,Copy number,Number of items,Receiving history,Discovery suppress,Statistical codes,Tags,Source
0b1e3760-f689-493e-a98e-9cc9dadb7e83,1,ho13,Physical,d1670310-ceac-47d9-aaba-aaeeb890bc07,Sample instance;5bf370e0-8cca-4d9c-82e4-5170ab2a0a39,Annex,Main Library,Main Library,www.someurl.com;link text;material;www.someurl.com;Version of resource,LC Modified,prefix,call number,suffix,shelving title,order format,aquisition method,receipt status,Test administrative note,Note;a note;false|Note;a note;true,Limited lending policy,retention policy,digitisation policy,statement;statement public note;statement staff note,statement for indexes;statement for indexes public note;statement for indexes staff note,statement for supplements;statement for supplements public note;statement for supplements staff note,copy number,10,|true;enum;chronology|;enum2;chronology2,,"Book, print (books)",,FOLIO
d29b9fa5-c7f5-4b77-9f99-79e5c3a9ae75,1,ho14,,,Sample instance;5bf370e0-8cca-4d9c-82e4-5170ab2a0a39,Main Library,Annex,Annex,,,,,,,,,,,,,,,,,,,,,,,,FOLIO
855d4693-4087-4339-8c14-c25c350e5e59,1,ho15,,,Sample instance;5bf370e0-8cca-4d9c-82e4-5170ab2a0a39,Main Library,Main Library,Main Library,,,,,,,,,,,,,,,,,,,,,,,,MARC
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
Holdings UUID,"Instance (Title, Publisher, Publication date)",Suppress from discovery,Holdings HRID,Source,Former holdings Id,Holdings type,Statistical codes,Administrative note,Holdings permanent location,Holdings temporary location,Shelving title,Holdings copy number,Holdings level call number type,Holdings level call number prefix,Holdings level call number,Holdings level call number suffix,Number of items,Holdings statement,Holdings statement for supplements,Holdings statement for indexes,ILL policy,Digitization policy,Retention policy,Notes,Electronic access,Acquisition method,Order format,Receipt status,Tags
855d4693-4087-4339-8c14-c25c350e5e59,title,,ho14,FOLIO,,1d51c4da-0d5f-4303-a922-3e9d7a9b22f3,,,Main Library,,,,,,,,,,,,,,,,,,,,
f836ca1d-c55a-4689-9b57-bb0de0c4d43d,title,,ho14,FOLIO,,,,,Main Library,,,,1d51c4da-0d5f-4303-a922-3e9d7a9b22f3,,,,,,,,,,,,,,,,
c10bb70f-b0fd-4623-a8cf-25e5198087ad,title,,ho14,FOLIO,,,,,Main Library,,,,,,,,,,,,,,,1d51c4da-0d5f-4303-a922-3e9d7a9b22f3;a note;false;diku;1d51c4da-0d5f-4303-a922-3e9d7a9b22f3,,,,,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Instance UUID,Suppress from discovery,Staff suppress,Previously held,Instance HRID,Source,Cataloged date,Instance status term,Mode of issuance,Administrative note,Resource title,Index title,Series statements,Contributors,Edition,Physical description,Resource type,Nature of content,Formats,Languages,Publication frequency,Publication range,Notes
Instance UUID,Suppress from discovery,Staff suppress,Previously held,Instance HRID,Source,Cataloged date,Instance status term,Mode of issuance,Administrative note,Resource title,Index title,Series statements,Contributors,Edition,Physical description,Resource type,Nature of content,Formats,Languages,Publication frequency,Publication range,Notes
7fbd5d84-62d1-44c6-9c45-6cb173998bbd,false,false,true,inst000000000006,FOLIO,2024-01-26,,,Cataloging data,Bridget Jones's Baby: the diaries,,,"Fielding, Helen",First American Edition,219 pages ; 20 cm.,6312d172-f0cf-40f6-b27d-9fa8feaf332f,,,eng,A frequency description,A publication range,
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
inst000000000022,Instance type was not found by id: [6312d172-f0cf-40f6-b27d-9fa8feaf332f]
inst000000000022,Instance type was not found by id: [6312d172-f0cf-40f6-b27d-9fa8feaf332f]
inst000000000022,Nature of content term was not found by id: [921e6d93-bafb-4a02-b62f-dcd027c45406]
inst000000000022,Instance format was not found by id: [5cb91d15-96b1-4b8a-bf60-ec310538da66]
inst000000000003,No match found
Expand Down
Loading