Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-UTF-8 compatible names more gracefull #42

Open
fsgeek opened this issue Jan 30, 2024 · 0 comments
Open

Handle non-UTF-8 compatible names more gracefull #42

fsgeek opened this issue Jan 30, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@fsgeek
Copy link
Contributor

fsgeek commented Jan 30, 2024

While building the Linux local indexer (Issue #23) I found that some files in my collection could not be properly encoded using UTF-8, which breaks the jsonl library (it does not appear to allow extending the encoding.) For the time being, I log the error and move on, since it isn't a major issue yet.

@fsgeek fsgeek added the enhancement New feature or request label Jan 30, 2024
@fsgeek fsgeek self-assigned this Jan 30, 2024
fsgeek added a commit that referenced this issue Jan 31, 2024
…of the common behavior, so before I merge this I'll verify that I can still index my Windows files as well.

In addition, I made changes to the generic layer to handle Issue #37, which relates to handling symlinks versus files.  I think this issue may require additional work but it should handle broken links now by ignoring them.

Finally, I did not create an issue but while working through the Linux local indexer I found a peculiar case of file names that could not be UTF-8 encoded.  For now I log the file name and keep going, but I'll open a new issue for this.  See Issue #42.
hadisinaee added a commit that referenced this issue Apr 1, 2024
* This is work to add support for the Linux Indexer.

As part of this I added the logic for the IndalekoLinuxMachineConfig.py so it now creates the config file and adds that to the database.

This also provides a prelminary version of the IndalekoLinuxLocalIndexer.py, though that remains a work in progress.

Some refactoring along the way as I pull some code into the generic layers.

* Make the Linux local file system indexer work.  This has changed some of the common behavior, so before I merge this I'll verify that I can still index my Windows files as well.

In addition, I made changes to the generic layer to handle Issue #37, which relates to handling symlinks versus files.  I think this issue may require additional work but it should handle broken links now by ignoring them.

Finally, I did not create an issue but while working through the Linux local indexer I found a peculiar case of file names that could not be UTF-8 encoded.  For now I log the file name and keep going, but I'll open a new issue for this.  See Issue #42.

* Sync local and remote changes. Nothing material.

* This is further work on Issue #23 and Issue #24.
At this point the linux indexer and ingester do seem to be gathering data, so it is a reasonable time to capture
the current state. Before merging this change in I'd like to make sure it doesn't break other platforms.

* Further cleanup, fixed issue with counts in linux ingester, add logic to track good and back symlinks in indexer.  See Issue #23 #24 #37

* More cleanup for Issues #23 #24

* Add counters to allow checking indexer output against ingester input/output

* Add uuid generation into Indexer body.  Still requires changing ingester(s) to use the UUID as the primary key.

* Handle situation where there is no st_birthtime field in the stat data.

* Issue #47.  These changes are prospective, but are identical to what was done on Windows (where it worked).

* Use UUID for new data ingester.

* Create python-package.yml

---------

Co-authored-by: Tony Mason <[email protected]>
Co-authored-by: Tony Mason <[email protected]>
Co-authored-by: Tony Mason <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant