-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle non-UTF-8 compatible names more gracefull #42
Labels
enhancement
New feature or request
Comments
fsgeek
added a commit
that referenced
this issue
Jan 31, 2024
…of the common behavior, so before I merge this I'll verify that I can still index my Windows files as well. In addition, I made changes to the generic layer to handle Issue #37, which relates to handling symlinks versus files. I think this issue may require additional work but it should handle broken links now by ignoring them. Finally, I did not create an issue but while working through the Linux local indexer I found a peculiar case of file names that could not be UTF-8 encoded. For now I log the file name and keep going, but I'll open a new issue for this. See Issue #42.
hadisinaee
added a commit
that referenced
this issue
Apr 1, 2024
* This is work to add support for the Linux Indexer. As part of this I added the logic for the IndalekoLinuxMachineConfig.py so it now creates the config file and adds that to the database. This also provides a prelminary version of the IndalekoLinuxLocalIndexer.py, though that remains a work in progress. Some refactoring along the way as I pull some code into the generic layers. * Make the Linux local file system indexer work. This has changed some of the common behavior, so before I merge this I'll verify that I can still index my Windows files as well. In addition, I made changes to the generic layer to handle Issue #37, which relates to handling symlinks versus files. I think this issue may require additional work but it should handle broken links now by ignoring them. Finally, I did not create an issue but while working through the Linux local indexer I found a peculiar case of file names that could not be UTF-8 encoded. For now I log the file name and keep going, but I'll open a new issue for this. See Issue #42. * Sync local and remote changes. Nothing material. * This is further work on Issue #23 and Issue #24. At this point the linux indexer and ingester do seem to be gathering data, so it is a reasonable time to capture the current state. Before merging this change in I'd like to make sure it doesn't break other platforms. * Further cleanup, fixed issue with counts in linux ingester, add logic to track good and back symlinks in indexer. See Issue #23 #24 #37 * More cleanup for Issues #23 #24 * Add counters to allow checking indexer output against ingester input/output * Add uuid generation into Indexer body. Still requires changing ingester(s) to use the UUID as the primary key. * Handle situation where there is no st_birthtime field in the stat data. * Issue #47. These changes are prospective, but are identical to what was done on Windows (where it worked). * Use UUID for new data ingester. * Create python-package.yml --------- Co-authored-by: Tony Mason <[email protected]> Co-authored-by: Tony Mason <[email protected]> Co-authored-by: Tony Mason <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While building the Linux local indexer (Issue #23) I found that some files in my collection could not be properly encoded using UTF-8, which breaks the jsonl library (it does not appear to allow extending the encoding.) For the time being, I log the error and move on, since it isn't a major issue yet.
The text was updated successfully, but these errors were encountered: