[fix] Handle importing UTF-16 encoded CSV files #568

akhilsharmaa · 2024-11-22T18:00:53Z

The issue was that all CSV files were being only read in UTF-8 format, causing failures when importing UTF-16 encoded files.

Added support for detecting file encoding using the popular Python library chardet.
If chardet fails to recognize the encoding, explicitly handle the files as UTF-16. This change ensures proper handling of both UTF-8 and UTF-16 encoded CSV files.

Fixes #550

*Screen recording includes "Testing the files included for testing "
screen-recording.webm

The issue was that all CSV files were being read in UTF-8 format, causing failures when importing UTF-16 encoded files. - Added support for detecting file encoding using the popular Python library chardet. - If chardet fails to recognize the encoding, explicitly handle the files as UTF-16. This change ensures proper handling of both UTF-8 and UTF-16 encoded CSV files. Fixes openwisp#550

…utf-8", "utf-16", "empty data", and invalid data.

nemesifier

Thanks for adding the unit test for the new function you added.
However, a test which replicates the original bug is still missing. We need a test which performs the steps described in the issue to replicate the bug, the test must fail in the same way as when the application is used manually (eg: a post request to import the CSV), once the the bug is fixed the test would pass.

nemesifier · 2024-11-26T14:04:11Z

openwisp_radius/utils.py



 def validate_csvfile(csvfile):
    csv_data = csvfile.read()
    try:
-        csv_data = decode_csv_data(csv_data)
+        if isinstance(csv_data, bytes): 
+            csv_data = csv_data.decode(get_encoding_format(csv_data))  


what happens if csv_data it's not an instance of bytes?

then, decoding the data is not needed. so we do nothing.
actually previously
csv_data = csv_data.decode('utf-8') if isinstance(csv_data, bytes) else csv_data
logic is still same.

openwisp_radius/utils.py

…which was producing error) earlier. - test_batch_utf16_file2.csv = Eap.G720.accountstest.csv - test_batch_utf16_file1.csv = import-bug-utf16.csv

openwisp_radius/tests/test_utils.py

nemesifier · 2024-11-27T13:49:45Z

requirements-test.txt

@@ -13,3 +13,4 @@ openwisp-monitoring @ https://github.com/openwisp/openwisp-monitoring/tarball/ma
 django-redis~=5.4.0
 mock-ssh-server~=0.9.1
 channels_redis~=4.2.1
+chardet


Is this dependency used only in tests? Can you point out where it is used? Pelase also specify the version as in the lines above.

It is used for detecting the file format,
https://github.com/akhilsharmaa/openwisp-radius/blob/01fe5c4e2273cfd915d331825ca0686c636880b9/openwisp_radius/utils.py#L141

…cified the "chardet" dependency version

nemesifier

The build is failing.

akhilsharmaa · 2024-12-04T05:25:37Z

The build is failing.

Please rerun the build once. I have fixed the errors, I run openwisp-qa-format. some files are also refactored which was not part of this PR. So, I also included those file changes with this commit, so that better for next time.

nemesifier

Please rerun the build once. I have fixed the errors, I run openwisp-qa-format. some files are also refactored which was not part of this PR. So, I also included those file changes with this commit, so that better for next time.

@akhilsharmaa the build is still failing. Please take the time to read the build output to understand how to replicate the same results locally, fix the problems in your dev env and then push when you're done. There's more information about all of this in the contributing guidelines.

akhilsharmaa · 2024-12-04T15:49:02Z

please re-run it again. i reproduced & fixed the workflow error, actually it was just because i was using the "black-v24" but the workflow is usign "black-v23". Now it should pass the QA-test.

nemesifier · 2024-12-04T18:43:14Z

please re-run it again. i reproduced & fixed the workflow error, actually it was just because i was using the "black-v24" but the workflow is usign "black-v23". Now it should pass the QA-test.

@akhilsharmaa still failing.

…b files, improved logic, more clean code

akhilsharmaa · 2024-12-05T10:40:25Z

@nemesifier, please re-run once. let's see how it's performing now.

coveralls · 2024-12-05T21:20:04Z

coverage: 98.654% (+0.005%) from 98.649%
when pulling ae29c39 on akhilsharmaa:Issues/550-failure-on-importing-utf-16-files
into 52d5bef on openwisp:master.

nemesifier

It looks a lot better thanks! Give me some time to do a bit of manual testing before merging.

pandafy

i have done a manual test by uploading a UTF-16 encoded CSV file (generated by LibreCalc) and the code handles the file without any issues.

requirements-test.txt

pandafy · 2025-01-24T07:56:10Z

openwisp_radius/utils.py

@@ -129,17 +129,40 @@ def find_available_username(username, users_list, prefix=False):
    return tmp


+def get_encoding_format(byte_data):


Any reason to implement this function instead of using the chardet library?

yes, because chardet was not detecting the utf-8-sig format encodings.

openwisp_radius/tests/static/test_batch_utf16_file1.csv

akhilsharmaa · 2025-01-27T09:20:39Z

@pandafy @nemesifier, please run the build. i have removed the un-used chardet dependency

akhilsharmaa · 2025-01-29T18:34:07Z

@nemesifier, build it please

nemesifier · 2025-01-29T19:08:01Z

@akhilsharmaa Build passed 👍

nemesifier

I did some manual testing, and I verified that I could successfully import the utf-16 file that was originally causing the issue.
I also added another test file which when opened with libreoffice is automatically detected as utf-16, just to be sure.

Thanks! 👍

akhilsharmaa and others added 3 commits November 22, 2024 23:12

Added unit testing for "get_encoding_format()" function testing for "…

e02605d

…utf-8", "utf-16", "empty data", and invalid data.

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

0dc7459

nemesifier requested changes Nov 26, 2024

View reviewed changes

Add unit test, reproducing the bug#550 in unit testing, added files (…

bf3947c

…which was producing error) earlier. - test_batch_utf16_file2.csv = Eap.G720.accountstest.csv - test_batch_utf16_file1.csv = import-bug-utf16.csv

akhilsharmaa requested a review from nemesifier November 27, 2024 11:20

nemesifier requested changes Nov 27, 2024

View reviewed changes

Refactored code (defined a seperate function for the assertion), sepe…

01fe5c4

…cified the "chardet" dependency version

akhilsharmaa requested a review from nemesifier November 27, 2024 14:16

akhilsharmaa added 3 commits November 28, 2024 23:48

Improved logic of the get_encoding_format function.

a45fd96

removed unwanted comments

1d7a7b3

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

19f0580

nemesifier requested changes Dec 3, 2024

View reviewed changes

[change] Refactored code by running openwisp-qa-format

6fa896f

akhilsharmaa requested a review from nemesifier December 4, 2024 05:26

nemesifier requested changes Dec 4, 2024

View reviewed changes

[changes] refactored using `black==23.12.1"

e341735

akhilsharmaa requested a review from nemesifier December 4, 2024 15:49

[changes] Removed use of chardet because it was unstable on utf-8 blo…

ee0b2cf

…b files, improved logic, more clean code

akhilsharmaa and others added 2 commits December 5, 2024 19:57

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

8f892c0

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

f26e44b

nemesifier reviewed Dec 11, 2024

View reviewed changes

pandafy reviewed Jan 24, 2025

View reviewed changes

nemesifier changed the title ~~Fixes: Handle importing UTF-16 encoded CSV files (#550)~~ [fix] Handle importing UTF-16 encoded CSV files (#550) Jan 24, 2025

nemesifier changed the title ~~[fix] Handle importing UTF-16 encoded CSV files (#550)~~ [fix] Handle importing UTF-16 encoded CSV files Jan 24, 2025

[changes] Removed unused dependency chardet

1fa2894

nemesifier and others added 2 commits January 27, 2025 11:42

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

7b0b9d0

Merge branch 'master' into Issues/550-failure-on-importing-utf-16-files

4e5f0a7

pandafy requested a review from nemesifier January 31, 2025 08:09

[tests] Added one more utf16 encoded test file

ae29c39

nemesifier approved these changes Jan 31, 2025

View reviewed changes

nemesifier merged commit 7c7fc4a into openwisp:master Jan 31, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Handle importing UTF-16 encoded CSV files #568

[fix] Handle importing UTF-16 encoded CSV files #568

akhilsharmaa commented Nov 22, 2024 •

edited by nemesifier

Loading

nemesifier left a comment •

edited

Loading

nemesifier Nov 26, 2024

akhilsharmaa Nov 27, 2024

nemesifier Nov 27, 2024

akhilsharmaa Nov 27, 2024

nemesifier left a comment

akhilsharmaa commented Dec 4, 2024 •

edited

Loading

nemesifier left a comment •

edited

Loading

akhilsharmaa commented Dec 4, 2024 •

edited

Loading

nemesifier commented Dec 4, 2024

akhilsharmaa commented Dec 5, 2024

coveralls commented Dec 5, 2024 •

edited

Loading

nemesifier left a comment

pandafy left a comment

pandafy Jan 24, 2025

akhilsharmaa Jan 24, 2025

akhilsharmaa commented Jan 27, 2025 •

edited

Loading

akhilsharmaa commented Jan 29, 2025

nemesifier commented Jan 29, 2025

nemesifier left a comment

		@@ -129,17 +129,40 @@ def find_available_username(username, users_list, prefix=False):
		return tmp


		def get_encoding_format(byte_data):

[fix] Handle importing UTF-16 encoded CSV files #568

[fix] Handle importing UTF-16 encoded CSV files #568

Conversation

akhilsharmaa commented Nov 22, 2024 • edited by nemesifier Loading

nemesifier left a comment • edited Loading

Choose a reason for hiding this comment

nemesifier Nov 26, 2024

Choose a reason for hiding this comment

akhilsharmaa Nov 27, 2024

Choose a reason for hiding this comment

nemesifier Nov 27, 2024

Choose a reason for hiding this comment

akhilsharmaa Nov 27, 2024

Choose a reason for hiding this comment

nemesifier left a comment

Choose a reason for hiding this comment

akhilsharmaa commented Dec 4, 2024 • edited Loading

nemesifier left a comment • edited Loading

Choose a reason for hiding this comment

akhilsharmaa commented Dec 4, 2024 • edited Loading

nemesifier commented Dec 4, 2024

akhilsharmaa commented Dec 5, 2024

coveralls commented Dec 5, 2024 • edited Loading

nemesifier left a comment

Choose a reason for hiding this comment

pandafy left a comment

Choose a reason for hiding this comment

pandafy Jan 24, 2025

Choose a reason for hiding this comment

akhilsharmaa Jan 24, 2025

Choose a reason for hiding this comment

akhilsharmaa commented Jan 27, 2025 • edited Loading

akhilsharmaa commented Jan 29, 2025

nemesifier commented Jan 29, 2025

nemesifier left a comment

Choose a reason for hiding this comment

akhilsharmaa commented Nov 22, 2024 •

edited by nemesifier

Loading

nemesifier left a comment •

edited

Loading

akhilsharmaa commented Dec 4, 2024 •

edited

Loading

nemesifier left a comment •

edited

Loading

akhilsharmaa commented Dec 4, 2024 •

edited

Loading

coveralls commented Dec 5, 2024 •

edited

Loading

akhilsharmaa commented Jan 27, 2025 •

edited

Loading