-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use git attributes to identify binary files #2083
base: main
Are you sure you want to change the base?
Conversation
Fixes #2042 Add support for identifying binary files using git attributes in addition to file extensions. * Add `getGitAttributes` function to read and parse the `.gitattributes` file in `packages/cli/src/fulltext/FileIndex.ts`. * Update `isBinaryFile` function to check git attributes information in addition to file extensions in `packages/cli/src/fulltext/FileIndex.ts`. * Update `filterFiles` function to use the `getGitAttributes` function to determine if a file is binary based on git attributes in `packages/cli/src/fulltext/FileIndex.ts`. * Add tests to verify that the context search/lookup/collector correctly identifies binary files using git attributes information in `packages/cli/tests/unit/fulltext/FileIndex.spec.ts`. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/getappmap/appmap-js/issues/2042?shareId=XXXX-XXXX-XXXX-XXXX).
3201148
to
190762d
Compare
}; | ||
const isBinaryFile = (fileName: string, gitAttributes?: ReturnType<typeof getGitAttributes>) => | ||
BINARY_FILE_EXTENSIONS.some((ext) => fileName.endsWith(ext)) || | ||
gitAttributes?.some((attr) => attr.binary && minimatch(fileName, attr.pattern, { dot: true })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The { dot: true }
option was missing from the demonstration this morning. This fixes the issue with dot files appearing with a **/*
pattern.
binary: attributes.includes('binary'), | ||
}; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this is much simpler than actual gitattributes(5):
Each line in gitattributes file is of form:
pattern attr1 attr2 ...
That is, a pattern followed by an attributes list, separated by whitespaces. Leading and trailing whitespaces are ignored. Lines that begin with # are ignored. Patterns that begin with a double quote are quoted in C style. When the pattern matches the path in question, the attributes listed on the line are given to the path.
When deciding what attributes are assigned to a path, Git consults $GIT_DIR/info/attributes file (which has the highest precedence), .gitattributes file in the same directory as the path in question, and its parent directories up to the toplevel of the work tree (the further the directory that contains .gitattributes is from the path in question, the lower its precedence). Finally global and system-wide files are considered (they have the lowest precedence).
This might be fine for our purposes, but the simplification warrants a comment at least — and I think skipping comments is easy enough to just do it.
if (!existsSync(gitAttributesPath)) return []; | ||
|
||
const gitAttributesContent = readFileSync(gitAttributesPath, 'utf-8'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using fs.exists() to check for the existence of a file before calling fs.open(), fs.readFile(), or fs.writeFile() is not recommended. Doing so introduces a race condition, since other processes may change the file's state between the two calls. Instead, user code should open/read/write the file directly and handle the error raised if the file does not exist.
(Essentially, exists returning true doesn't guarantee that read will succeed, so you have to handle the error anyway, so might as well skip the exists.)
return { | ||
pattern, | ||
binary: attributes.includes('binary'), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only pulling binary like that, why even bother returning other patterns? Perhaps it should just return a list of binary patterns instead. This will also simplify upstream code — you can just add those to excluded patterns.
Fixes #2042
Add support for identifying binary files using git attributes in addition to file extensions.
getGitAttributes
function to read and parse the.gitattributes
file inpackages/cli/src/fulltext/FileIndex.ts
.isBinaryFile
function to check git attributes information in addition to file extensions inpackages/cli/src/fulltext/FileIndex.ts
.filterFiles
function to use thegetGitAttributes
function to determine if a file is binary based on git attributes inpackages/cli/src/fulltext/FileIndex.ts
.packages/cli/tests/unit/fulltext/FileIndex.spec.ts
.For more details, open the Copilot Workspace session.