Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use git attributes to identify binary files #2042

Open
kgilpin opened this issue Oct 4, 2024 · 1 comment · May be fixed by #2083
Open

Use git attributes to identify binary files #2042

kgilpin opened this issue Oct 4, 2024 · 1 comment · May be fixed by #2083
Assignees
Labels

Comments

@kgilpin
Copy link
Contributor

kgilpin commented Oct 4, 2024

Git metadata can indicate which files are binary. Use git attributes information as well as file extension in the context search / lookup / collector to determine which files to treat as binary.

@kgilpin kgilpin self-assigned this Oct 4, 2024
@kgilpin kgilpin added enhancement New feature or request navie navie-plan labels Oct 4, 2024
Copy link

github-actions bot commented Oct 4, 2024

Title: Integrate Git Attributes for Binary File Classification in Context Search

Problem:
The system currently utilizes file extensions to identify binary files during context searches and lookups. However, this method can be imprecise as file extensions might not provide an accurate classification of binary files. Git provides a mechanism to identify binary files through git attributes, improving the accuracy of file classification. The task is to incorporate git attributes alongside file extensions in the context search process to leverage Git’s binary file classification.

Analysis:
To accurately identify binary files, the system should combine the existing method of checking file extensions with Git's metadata. Git attributes can be configured to define whether specific files are treated as binary, which can override the default behavior based on file extensions. The system can practically utilize the git check-attr command to query git attributes applied to files and determine if they are declared as binary. Incorporating this into the current mechanism will help filter out files more effectively during the context search and lookup process.

Proposed Changes:

  1. File: packages/cli/src/fulltext/FileIndex.ts

    • Modify the filterFiles function to include logic that also checks git attributes to determine if a file should be considered binary.
    • Utilize the git check-attr command within the try-catch block of the function where file filtering is performed.
    • If git attributes mark a file as binary, skip adding it to result, regardless of its extension.
  2. File: packages/cli/src/fulltext/listGitProjectFiles.ts

    • Extend functions that collect file lists to optionally get git attribute status using git check-attr, ensuring the returned file list is annotated with their git-determined binary status if applicable.
  3. File: packages/scanner/src/lastGitOrFSModifiedDate.ts

    • Introduce a utility function that encapsulates the logic of determining a file's binary status by calling git check-attr and interpreting the result.
    • Ensure that this utility can be reused across different modules that need to identify binary files.
  4. Integration Points:

    • Update any part of the module that relies on file extension checks for binary files to utilize the new binary determination mechanism using Git attributes.
    • Integrate the binary determination results in processes that collect context or conduct searches to adhere to improved binary file recognition.
  5. Testing:

    • Create or modify unit tests in relevant test files, such as packages/cli/tests/unit/fulltext/listGitProjectFiles.spec.ts, to verify that the binary file identification process now accurately incorporates git attributes.
    • Test scenarios should cover both cases where file extensions and git attributes indicate binary status independently, including combinations where they might conflict.

By implementing these changes, we enhance the precision of context searches and file indexing by intelligently considering Git's mechanisms for binary file classification.

dustinbyrne added a commit that referenced this issue Oct 22, 2024
Fixes #2042

Add support for identifying binary files using git attributes in addition to file extensions.

* Add `getGitAttributes` function to read and parse the `.gitattributes` file in `packages/cli/src/fulltext/FileIndex.ts`.
* Update `isBinaryFile` function to check git attributes information in addition to file extensions in `packages/cli/src/fulltext/FileIndex.ts`.
* Update `filterFiles` function to use the `getGitAttributes` function to determine if a file is binary based on git attributes in `packages/cli/src/fulltext/FileIndex.ts`.
* Add tests to verify that the context search/lookup/collector correctly identifies binary files using git attributes information in `packages/cli/tests/unit/fulltext/FileIndex.spec.ts`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/getappmap/appmap-js/issues/2042?shareId=XXXX-XXXX-XXXX-XXXX).
@dustinbyrne dustinbyrne linked a pull request Oct 22, 2024 that will close this issue
@dustinbyrne dustinbyrne self-assigned this Oct 22, 2024
dustinbyrne added a commit that referenced this issue Oct 22, 2024
Fixes #2042

Add support for identifying binary files using git attributes in addition to file extensions.

* Add `getGitAttributes` function to read and parse the `.gitattributes` file in `packages/cli/src/fulltext/FileIndex.ts`.
* Update `isBinaryFile` function to check git attributes information in addition to file extensions in `packages/cli/src/fulltext/FileIndex.ts`.
* Update `filterFiles` function to use the `getGitAttributes` function to determine if a file is binary based on git attributes in `packages/cli/src/fulltext/FileIndex.ts`.
* Add tests to verify that the context search/lookup/collector correctly identifies binary files using git attributes information in `packages/cli/tests/unit/fulltext/FileIndex.spec.ts`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/getappmap/appmap-js/issues/2042?shareId=XXXX-XXXX-XXXX-XXXX).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants