Skip to content

Commit

Permalink
Prevent git from rehashing 4GiB files
Browse files Browse the repository at this point in the history
The index stores file sizes using a uint32_t. This causes any file
that is a multiple of 2^32 to have a cached file size of zero.
Zero is a special value used by racily clean. This causes git to
rehash every file that is a multiple of 2^32 every time git status
or git commit is run.

This patch mitigates the problem by making all files that are a
multiple of 2^32 appear to have a size of 1<<31 instead of zero.

The value of 1<<31 is chosen to keep it as far away from zero
as possible to help prevent things getting mixed up with unpatched
versions of git.

An example would be to have a 2^32 sized file in the index of
patched git. Patched git would save the file as 2^31 in the cache.
An unpatched git would very much see the file has changed in size
and force it to rehash the file, which is safe. The file would
have to grow or shrink by exactly 2^31 and retain all of its
ctime, mtime, and other attributes for old git to not notice
the change.

This patch does not change the behavior of any file that is not
an exact multiple of 2^32.

Signed-off-by: Jason D. Hatton <[email protected]>
Signed-off-by: brian m. carlson <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
jhattongfs authored and gitster committed Oct 13, 2023
1 parent 678eb55 commit 5143ac0
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
20 changes: 18 additions & 2 deletions statinfo.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@
#include "environment.h"
#include "statinfo.h"

/*
* Munge st_size into an unsigned int.
*/
static unsigned int munge_st_size(off_t st_size) {
unsigned int sd_size = st_size;

/*
* If the file is an exact multiple of 4 GiB, modify the value so it
* doesn't get marked as racily clean (zero).
*/
if (!sd_size && st_size)
return 0x80000000;
else
return sd_size;
}

void fill_stat_data(struct stat_data *sd, struct stat *st)
{
sd->sd_ctime.sec = (unsigned int)st->st_ctime;
Expand All @@ -12,7 +28,7 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
sd->sd_ino = st->st_ino;
sd->sd_uid = st->st_uid;
sd->sd_gid = st->st_gid;
sd->sd_size = st->st_size;
sd->sd_size = munge_st_size(st->st_size);
}

int match_stat_data(const struct stat_data *sd, struct stat *st)
Expand Down Expand Up @@ -51,7 +67,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
changed |= INODE_CHANGED;
#endif

if (sd->sd_size != (unsigned int) st->st_size)
if (sd->sd_size != munge_st_size(st->st_size))
changed |= DATA_CHANGED;

return changed;
Expand Down
16 changes: 16 additions & 0 deletions t/t7508-status.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1745,4 +1745,20 @@ test_expect_success 'slow status advice when core.untrackedCache true, and fsmon
)
'

test_expect_success EXPENSIVE 'status does not re-read unchanged 4 or 8 GiB file' '
(
mkdir large-file &&
cd large-file &&
# Files are 2 GiB, 4 GiB, and 8 GiB sparse files.
test-tool truncate file-a 0x080000000 &&
test-tool truncate file-b 0x100000000 &&
test-tool truncate file-c 0x200000000 &&
# This will be slow.
git add file-a file-b file-c &&
git commit -m "add large files" &&
git diff-index HEAD file-a file-b file-c >actual &&
test_must_be_empty actual
)
'

test_done

0 comments on commit 5143ac0

Please sign in to comment.