Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Handle importing UTF-16 encoded CSV files #568

Conversation

akhilsharmaa
Copy link
Contributor

@akhilsharmaa akhilsharmaa commented Nov 22, 2024

The issue was that all CSV files were being only read in UTF-8 format, causing failures when importing UTF-16 encoded files.

  • Added support for detecting file encoding using the popular Python library chardet.
  • If chardet fails to recognize the encoding, explicitly handle the files as UTF-16. This change ensures proper handling of both UTF-8 and UTF-16 encoded CSV files.

Fixes #550

*Screen recording includes "Testing the files included for testing "
screen-recording.webm

akhilsharmaa and others added 3 commits November 22, 2024 23:12
The issue was that all CSV files were being read in UTF-8 format, causing failures when importing UTF-16 encoded files.

- Added support for detecting file encoding using the popular Python library chardet.
- If chardet fails to recognize the encoding, explicitly handle the files as UTF-16.
This change ensures proper handling of both UTF-8 and UTF-16 encoded CSV files.

Fixes openwisp#550
…utf-8", "utf-16", "empty data", and invalid data.
Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the unit test for the new function you added.
However, a test which replicates the original bug is still missing. We need a test which performs the steps described in the issue to replicate the bug, the test must fail in the same way as when the application is used manually (eg: a post request to import the CSV), once the the bug is fixed the test would pass.



def validate_csvfile(csvfile):
csv_data = csvfile.read()
try:
csv_data = decode_csv_data(csv_data)
if isinstance(csv_data, bytes):
csv_data = csv_data.decode(get_encoding_format(csv_data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if csv_data it's not an instance of bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, decoding the data is not needed. so we do nothing.
actually previously
csv_data = csv_data.decode('utf-8') if isinstance(csv_data, bytes) else csv_data
logic is still same.

openwisp_radius/utils.py Outdated Show resolved Hide resolved
…which was producing error) earlier.

-  test_batch_utf16_file2.csv = Eap.G720.accountstest.csv
-  test_batch_utf16_file1.csv = import-bug-utf16.csv
openwisp_radius/tests/test_utils.py Show resolved Hide resolved
@@ -13,3 +13,4 @@ openwisp-monitoring @ https://github.com/openwisp/openwisp-monitoring/tarball/ma
django-redis~=5.4.0
mock-ssh-server~=0.9.1
channels_redis~=4.2.1
chardet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dependency used only in tests? Can you point out where it is used? Pelase also specify the version as in the lines above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing.

@akhilsharmaa
Copy link
Contributor Author

akhilsharmaa commented Dec 4, 2024

The build is failing.

Please rerun the build once. I have fixed the errors, I run openwisp-qa-format. some files are also refactored which was not part of this PR. So, I also included those file changes with this commit, so that better for next time.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rerun the build once. I have fixed the errors, I run openwisp-qa-format. some files are also refactored which was not part of this PR. So, I also included those file changes with this commit, so that better for next time.

@akhilsharmaa the build is still failing. Please take the time to read the build output to understand how to replicate the same results locally, fix the problems in your dev env and then push when you're done. There's more information about all of this in the contributing guidelines.

@akhilsharmaa
Copy link
Contributor Author

akhilsharmaa commented Dec 4, 2024

please re-run it again. i reproduced & fixed the workflow error, actually it was just because i was using the "black-v24" but the workflow is usign "black-v23". Now it should pass the QA-test.

@nemesifier
Copy link
Member

please re-run it again. i reproduced & fixed the workflow error, actually it was just because i was using the "black-v24" but the workflow is usign "black-v23". Now it should pass the QA-test.

@akhilsharmaa still failing.

@akhilsharmaa
Copy link
Contributor Author

@nemesifier, please re-run once. let's see how it's performing now.

@coveralls
Copy link

coveralls commented Dec 5, 2024

Coverage Status

coverage: 98.654% (+0.005%) from 98.649%
when pulling ae29c39 on akhilsharmaa:Issues/550-failure-on-importing-utf-16-files
into 52d5bef on openwisp:master.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a lot better thanks! Give me some time to do a bit of manual testing before merging.

Copy link
Member

@pandafy pandafy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have done a manual test by uploading a UTF-16 encoded CSV file (generated by LibreCalc) and the code handles the file without any issues.

requirements-test.txt Outdated Show resolved Hide resolved
@@ -129,17 +129,40 @@ def find_available_username(username, users_list, prefix=False):
return tmp


def get_encoding_format(byte_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to implement this function instead of using the chardet library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because chardet was not detecting the utf-8-sig format encodings.

openwisp_radius/tests/static/test_batch_utf16_file1.csv Outdated Show resolved Hide resolved
@nemesifier nemesifier changed the title Fixes: Handle importing UTF-16 encoded CSV files (#550) [fix] Handle importing UTF-16 encoded CSV files (#550) Jan 24, 2025
@nemesifier nemesifier changed the title [fix] Handle importing UTF-16 encoded CSV files (#550) [fix] Handle importing UTF-16 encoded CSV files Jan 24, 2025
@akhilsharmaa
Copy link
Contributor Author

akhilsharmaa commented Jan 27, 2025

@pandafy @nemesifier, please run the build. i have removed the un-used chardet dependency

@akhilsharmaa
Copy link
Contributor Author

@nemesifier, build it please

@nemesifier
Copy link
Member

@akhilsharmaa Build passed 👍

@pandafy pandafy requested a review from nemesifier January 31, 2025 08:09
Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some manual testing, and I verified that I could successfully import the utf-16 file that was originally causing the issue.
I also added another test file which when opened with libreoffice is automatically detected as utf-16, just to be sure.

Thanks! 👍

@nemesifier nemesifier merged commit 7c7fc4a into openwisp:master Jan 31, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

[bug] CSV import: failure on importing UTF16 encoded CSV files fails with 500 internal server error
4 participants