-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems on 15TB ZFS flatfs repo: mutating MFS dir with 25k entries, ipfs add hang #10588
Comments
GitHub said "File size too big: 25 MB are allowed, 50 MB were attempted to upload." So here's "ipfs-profile-2024-11-17-utc.zip": Update edit: This really is worse than expected. Add files command hanging on a small file (and sometimes oom-killing things) is one thing, but missing files is way worse. I see that the ipfs datastore indexes or whatever cannot find the following, meaning that it's completely gone or the index got messed up:
I had 100% of that folder in the past: Timeline: Point is that no filesystem or data corruption or data screw up happened other than ipfs's. That bafybeif...qm2m folder = 321 blocks (32,722,333 bytes). It corresponds to these .data files: I didn't run any gc command, and as far as I know, no gc was ran in the background. No point in running mirror HDDs if the data gets deleted by some software (RAID is not a backup). This is really annoying or frustrating. |
Note: It is not that the 244 out of 322 blocks were deleted. Adding files to MFS does not necessarily download all the data, but rather creates references to the data. Did you explicitly preload the data after It'd be helpful to get profiles during the bad events such as:
|
I think your leveldb datastore blew up. Does leveldb is used to store the pinset and the mfs root, probably it is hanging there. That doesn't explain your "data missing" part, as the blocks are stored in flatfs ( |
You are referring to the 32MB folder I wrote about: probably didn't use MFS at all with that one (if I did I only copied its root to MFS after
It's basically hanging again. Command
The "Sl" status reported by |
Roughly 24 or 48 hours after About pinning being broken - in the past I was able to do this: run That go.dev tool looks helpful. I wondered about a thing to do that in the past! Anyways, I already did that: converting the CID's data into the corresponding CIQ*.data files. I said that in an above post which contains the text "It corresponds to these .data files:". How I did it: (1.) saw that I didn't have all of a CID in "repo A" where I did in the past (2.) downloaded a .car file of that CID from an HTTP-only website (3.) deleted everything in "repo B" = empty repo (4.) imported said CAR file into repo B (5.) got a list of all .data files in repo B (6.) checked repo A to see which ones were missing = only had 78 out of 322 of them. (Repo A is in one computer and repo B is in a different computer.) |
Timeline: What that looks like - Terminal tab 2:
Terminal tab 1:
I then ran |
Checklist
Installation method
dist.ipfs.tech or ipfs-update
Version
Config
Description
Things were working fairly fast and OK (not great, but OK), then after a certain event a day ago things got way slower or stopped working. Setup: ZFS mirror pool of two 18 TB HDD which mostly contains IPFS data, like 15 TB of that. Things were working OK because like a month ago pinning stopped working. I saw some error about "cannot fix 1800 pins" or something occasionally. A day ago I was doing this with a list of 1,105,578 IPFS CIDs (totaling to 1.2 TB):
$ cat /home/u/Downloads/t5.txt | xargs -d "\n" sh -c 'for args do cid="$(echo "$args" | sed "s/ .*//g")"; fn="$(echo "$args" | sed "s/.* //g")"; date -u; ipfs files cp -p "/ipfs/$cid" "/dup/organize/4chan/mlp/$(echo "$fn" | perl -pE "s/^(....).*/\1/g")/$(echo "$fn" | perl -pE "s/^....(..).*/\1/g")/$fn"; date -u; done' _ >> /home/u/Downloads/t6.txt
What that command does: the input is many lines where each line is "[raw blocks CID] [Unix timestamp filename]" and each file is 1KB to 4MB in size. That command was running in offline mode yesterday; no ipfs daemon was running. It then puts those files in paths like this: "ipfs files cp -p /ipfs/[cid] /mfs/1419/00/1419000480902.jpg". It logs the timestamp of each "ipfs files cp" command to file "t6.txt".
That was the event which I think messed things up. It did 25,859 operations of copying files to MFS. After I canceled that command 24 or 48 hours ago, I have had persistent problems with my IPFS stuff. Such as the daemon not starting or hanging: ipfs/ipfs-docs#1956 - not a problem anymore. I do have the following problem; adding a small file to IPFS never finishes - this ran for like 30 minutes and didn't exit:
And as said above, pinning doesn't work, so
ipfs --offline pin add --progress bafybeibfcytwdefk2hmatub3ab4wvfyei34xkwqz5ubzrqwslxi3d5ehau
is always stuck as "Fetched/Processed 0 nodes". About the 25,859 operations before it became bad: at the start of the text file you can see that files were copied to MFS quickly and at the end it went way slower:Like a week ago I saw some error about a dirty flag not being cleared. I have attached the output file of "$ ipfs diag profile" for more details. If there's something to be learned from this, I guess it's to not copy many files to MFS without the IPFS daemon running. I was trying to copy more than one million but only copied like 25,000. Also I've seen some weirdness with the "ipfs files" set of commands in the past (copy/move).
Related issue: #10383 titled "Improve data onbaording speed: ipfs add and ipfs dag import|export" (I recommend using raw blocks instead).
The text was updated successfully, but these errors were encountered: