-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup using Checkpointing #52
Comments
Apache Paimon is an LSM format designed for writing to S3 compatible object storage while it's running. I really liked this author's articles on it. https://jack-vanlightly.com/analyses/2024/7/3/understanding-apache-paimon-consistency-model-part-1 |
Has there been any more thought on this? I'm looking at using this crate but backups are necessary for my use case. I wouldn't mind trying out an implementation if theres been a general decision on the right approach. |
The general idea is to use RocksDB-like Checkpoints for online backups, I didn't want to add it in 1.x because I knew 2.x would add and change some stuff anyway. But to fully implement it, there will need to be some synchronization, because the journal(s) need to be copied, then all the disk segments (hard-linked if possible) and metadata files, for all partitions. When the backup starts, no journal GC, journal writes, blob file rewrites, memtable flushes or compactions are allowed to complete temporarily, but reads should not be blocked. So I think there needs to be a keyspace-level RwLock... Another way would be to do a full scan through (using a snapshot) and write it out to a flat file... For offline backups, simply using |
I have added a todo list to the OP. I still have some stuff coming up for 2.4.0/2.5.0, so this isn't gonna make it in there, but I want to take a look at it in the near future... unless someone wants to take a look into it and contribute a possible solution, even if just a draft. |
Would rotating to a new journal (that is excluded from backup) when triggering a checkpoint backup allow writes to also continue? I guess it could result in unbounded memtable and journal growth if backup takes a long time during a period of high write throughput though. |
@Svenskunganka I think it can be more granular. Copying the active journal should be quick, so we just need to lock it shortly. Then, we can already unlock the active journal - and then, we copy the sealed journals; they are just not allowed to be dropped by journal GC during all this time for consistency. My todo list in the OP is more correct I think. |
Possible API
Keyspace::backup_to<P: AsRef<Path>>(path: P) -> crate::Result<()>
TxKeyspace::backup_to<P: AsRef<Path>>(path: P) -> crate::Result<()>
(just needs to call inner)(When we have #70 we could also provide a offline backup method that takes an exclusive temporary lock of an unopened Keyspace and clones it file-by-file (ideally using hard links))
Steps
BackupLock
) to prevent garbage collection and compactions from submitting results (which would change disk segments)version
)BackupLock
, everything else can now proceed)Unsure
How to prevent compactions from applying their result? There may be compactions going on currently. We may need to pass the
BackupLock
to all trees, and make compactors take the lock? Can this be tested reliably? Same goes for blob file rewrites.Also discussed in #50
Originally posted by jeromegn May 19, 2024
If one wanted to do a backup of the database, what's the best practice here? Is there a way to do an "online" (not shutting down the process using the database) backup?
Since there are many directories and files, I assume the safest way is to persist from the process, exit and then create a tarball of the whole directory.
With SQLite, for example, it's possible to run a
VACUUM INTO
that will create a space-efficient snapshot of the database. SQLite can support multiple processes though, so it's a different ball game.The text was updated successfully, but these errors were encountered: