-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Handle importing UTF-16 encoded CSV files #568
[fix] Handle importing UTF-16 encoded CSV files #568
Conversation
The issue was that all CSV files were being read in UTF-8 format, causing failures when importing UTF-16 encoded files. - Added support for detecting file encoding using the popular Python library chardet. - If chardet fails to recognize the encoding, explicitly handle the files as UTF-16. This change ensures proper handling of both UTF-8 and UTF-16 encoded CSV files. Fixes openwisp#550
…utf-8", "utf-16", "empty data", and invalid data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the unit test for the new function you added.
However, a test which replicates the original bug is still missing. We need a test which performs the steps described in the issue to replicate the bug, the test must fail in the same way as when the application is used manually (eg: a post request to import the CSV), once the the bug is fixed the test would pass.
openwisp_radius/utils.py
Outdated
|
||
|
||
def validate_csvfile(csvfile): | ||
csv_data = csvfile.read() | ||
try: | ||
csv_data = decode_csv_data(csv_data) | ||
if isinstance(csv_data, bytes): | ||
csv_data = csv_data.decode(get_encoding_format(csv_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if csv_data it's not an instance of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, decoding the data is not needed. so we do nothing.
actually previously
csv_data = csv_data.decode('utf-8') if isinstance(csv_data, bytes) else csv_data
logic is still same.
…which was producing error) earlier. - test_batch_utf16_file2.csv = Eap.G720.accountstest.csv - test_batch_utf16_file1.csv = import-bug-utf16.csv
requirements-test.txt
Outdated
@@ -13,3 +13,4 @@ openwisp-monitoring @ https://github.com/openwisp/openwisp-monitoring/tarball/ma | |||
django-redis~=5.4.0 | |||
mock-ssh-server~=0.9.1 | |||
channels_redis~=4.2.1 | |||
chardet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dependency used only in tests? Can you point out where it is used? Pelase also specify the version as in the lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used for detecting the file format,
https://github.com/akhilsharmaa/openwisp-radius/blob/01fe5c4e2273cfd915d331825ca0686c636880b9/openwisp_radius/utils.py#L141
…cified the "chardet" dependency version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build is failing.
Please rerun the build once. I have fixed the errors, I run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rerun the build once. I have fixed the errors, I run
openwisp-qa-format
. some files are also refactored which was not part of this PR. So, I also included those file changes with this commit, so that better for next time.
@akhilsharmaa the build is still failing. Please take the time to read the build output to understand how to replicate the same results locally, fix the problems in your dev env and then push when you're done. There's more information about all of this in the contributing guidelines.
please re-run it again. i reproduced & fixed the workflow error, actually it was just because i was using the "black-v24" but the workflow is usign "black-v23". Now it should pass the QA-test. |
@akhilsharmaa still failing. |
…b files, improved logic, more clean code
@nemesifier, please re-run once. let's see how it's performing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a lot better thanks! Give me some time to do a bit of manual testing before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have done a manual test by uploading a UTF-16 encoded CSV file (generated by LibreCalc) and the code handles the file without any issues.
@@ -129,17 +129,40 @@ def find_available_username(username, users_list, prefix=False): | |||
return tmp | |||
|
|||
|
|||
def get_encoding_format(byte_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to implement this function instead of using the chardet library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, because chardet was not detecting the utf-8-sig
format encodings.
@pandafy @nemesifier, please run the build. i have removed the un-used chardet dependency |
@nemesifier, build it please |
@akhilsharmaa Build passed 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some manual testing, and I verified that I could successfully import the utf-16 file that was originally causing the issue.
I also added another test file which when opened with libreoffice is automatically detected as utf-16, just to be sure.
Thanks! 👍
The issue was that all CSV files were being only read in UTF-8 format, causing failures when importing UTF-16 encoded files.
Fixes #550
*Screen recording includes "Testing the files included for testing "
screen-recording.webm