Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF-8 BOM to files written to S3/Minio #567

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

DerkaouiAnas
Copy link

MODEXPW-508 - Add UTF-8 BOM to files written to S3/Minio

Purpose

The purpose of this change is to address issues with Excel not properly displaying Arabic characters in CSV files exported from our system. By adding a UTF-8 Byte Order Mark (BOM) to the beginning of these files, we ensure that applications like Excel correctly recognize the file encoding as UTF-8, thus displaying Arabic characters properly.
Related JIRA issue: https://issues.folio.org/browse/MODEXPW-508

Approach

To implement this change, we've modified the write method in our S3/Minio file writing utility. The approach involves:

  • Defining the UTF-8 BOM as a byte array at the beginning of the method.
  • Using a ByteArrayOutputStream to combine the BOM with the original file content.
  • Writing the combined byte array (BOM + original content) to S3 or Minio.

This change addresses issues with Excel not properly displaying Arabic
characters in CSV files. By adding the BOM, we ensure that applications
like Excel correctly recognize the file encoding as UTF-8.
@CLAassistant
Copy link

CLAassistant commented Sep 17, 2024

CLA assistant check
All committers have signed the CLA.

@@ -177,16 +178,23 @@ public String upload(String path, String filename) throws IOException {
* @throws IOException - if an I/O error occurs
*/
public String write(String path, byte[] bytes, Map<String, String> headers) throws IOException {
byte[] bom = {(byte)0xEF, (byte)0xBB, (byte)0xBF};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to make this a constant instead of initializing the array in the method each time?
private static final byte[] UTF8_BOM = new byte[]{(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};

@khandramai khandramai self-requested a review November 12, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants