-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Lifetime for blob object in openblob
#148
Comments
I'll have to read your entire issue with more care to give better suggestions, but the semantics of the In fact, for a single query, the callback may be called more than one once. Consider this query: SELECT openblob('main', 'entries', 'contents', (SELECT rowid FROM entries WHERE id = :id), :writeMode, :callback) This could be written: SELECT openblob('main', 'entries', 'contents', rowid, :writeMode, :callback)
FROM entries
WHERE id = :id Now consider this modification: SELECT openblob('main', 'entries', 'contents', rowid, :writeMode, :callback, id)
FROM entries
WHERE id GLOB 'id:2024-09-06T*' If I understood your schema and id generation correctly, this would invoke your callback for all This is something we can't really do if the |
Per @ncruces' comments in ncruces#148 (comment), this change clarifies the documentation for openblob to help clients understand the lifecycle of the blob handle within the callback.
@ncruces - Thanks, that's very helpful! I'll keep playing around with my |
Per @ncruces' comments in #148 (comment), this change clarifies the documentation for openblob to help clients understand the lifecycle of the blob handle within the callback.
I'm not sure I understand why you can't start with your attempt 2, and use Isn't I definitely would expect/want that to work (obviously, only a single blob should match your query). |
I notice that
This is unfortunate, but it means that, probably, The idea would be to allocate the internal buffer only once. This will save a |
Looking at |
Yeah, that should work. The issue in my case is that my storage component shouldn't care about HTTP, and my HTTP handling code shouldn't care about SQLite. It's easy to keep the two components loosely coupled if I could grab an I'll either write a wrapper that abstracts the SQLite stuff behind an |
If I understand correctly since CopyN wraps the source in LimitedReader, which does not implement WriterTo, io.Copy won't use it even if the source passed to CopyN does implement it. |
Yes, this is the reason. Then there's The reason I'm calling this out is that:
A large Footnotes
|
9d77322 exposes The only odd ball now is |
For reusing the same buffer for all calls to |
I'll leave this open while you work on mtlynch/picoshare#567. If/when you're closer to done, if you want me to tag these/other improvements, feel free to ping me. |
Been thinking a bit more about this. Really curious what the performance will be like. If we need to amortize reads better with larger buffers (because of type BufferedReadSeeker struct {
bufio.Reader
s io.ReadSeeker
}
func NewBufferedReadSeeker(r io.ReadSeeker, size int) *BufferedReadSeeker {
return &BufferedReadSeeker{Reader: *bufio.NewReaderSize(r, size), s: r}
}
func (b *BufferedReadSeeker) Reset(r io.Reader) {
b.s = r.(io.ReadSeeker)
b.Reader.Reset(b.s)
}
func (b *BufferedReadSeeker) Seek(offset int64, whence int) (int64, error) {
b.Reader.Reset(b.s)
return b.s.Seek(offset, whence)
} |
Thanks for all your work on this! Also, just want to clarify that my donation was based on appreciation of past work, so I hope it didn't feel like there were strings attached or that I expected it to be enough to compensate you for further feature work here. I've got a version of PicoShare's working with your driver, and it works locally, but I'm seeing strange errors in my end-to-end tests and when I deploy the code to a real cloud server. I'm going to try another test, but my schedule is a bit wacky for the next few weeks/months. I tried uploading a 1.5 GB file to a PicoShare instance on fly.io with the existing code and one with the For a bit of additional context, for PicoShare's needs, I/O speed is not as important as memory overhead. In my current implementation where I break the file into chunks with the mattn driver, on a small 256 MB RAM cloud server, things work fine until files get to be about 1 GB, and then I start seeing memory exhaustion for files larger than that (mtlynch/picoshare#355). Throwing a ton of server resources at it solves the problem, but I get a lot of reports from users about crashes or sluggish performance on large files. |
OK so... First of all, congrats! Second, thanks for the donation! Third, I feel no pressure from the donation, and I also hope you don't feel any pressure from my eagerness: please, enjoy your family time! The reason I've been this engaged is that I always envisioned the blob API being used for this exact use case... and you (accidentally?) volunteered as guinea pig. Looking over your code, I see one potential issue: using |
Thanks! Okay, I tried deploying a fix to a 1x-shared-CPU-256MB-RAM fly.io server based on mtlynch/picoshare@dfbedf8, which includes your The app uses about 83 MiB of 213 MiB of available RAM at idle, and I can upload small files fine. When I tried uploading a 237 MB file, PicoShare crashes with a failure to allocate memory:
Does anything there stick out to you? Should I try rearchitecting it to break up the file into smaller blobs that span several rows rather than sticking files of arbitrary length into a single blob per file? |
I would not expect this to cause any memory issues. Let me look into it. I'm not sure I can replicate fly specifically without creating an account with them, but I have a small hosted VM I can try. It'll take me time though. |
This is odd: Tangent: Other than that, my assumption is that as your VM is memory strapped, and since each connection lives under the illusion that it can allocate 4GB of memory, it tries to, and eventually fails. You could try something like this, to force each connection to live with 32MB of memory (and maybe sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().
WithMemoryLimitPages(512) But this is more configuration than should be needed, and I'm not sure it works. I guess for memory constrained devices setting the soft/hard limits would be better, but are disabled in the current build. This driver probably does consume more memory than mattn, and I wasted your time. 😔 |
Were you able to repro? I realized I could repro a little easier using Docker's resource limits ( I made a branch that makes the repro easier: https://github.com/mtlynch/picoshare/tree/deploy-ncruces Script for spinning up the Docker container is
Is 4 GB an assumption of SQLite itself or ncruces/go-sqlite3?
I just tried this, but I continue seeing the crash under resource-capped Docker.
Aw, are you sure? I feel like there has to be some way of writing an arbitrarily large file to SQLite with only 256 MB of RAM. 256 MB isn't even a hard requirement. If I can tell users that arbitrarily large files will work as long as they have at least N GB of RAM, that would work too as long as N isn't something crazy. |
No, I didn't have the time. I'll try though, as I'm interested in the wazero side of this too.
It's a Wasm assumption. Each Wasm module is like a sandbox/VM, and my SQLite module is default configured to max out at the maximum possible Wasm memory (4G). Each connection lives in its own sandbox. But memory is allocated as needed, and an What this makes hard is to inform SQLite of the memory restriction so it can adapt. Your machine has a lot of memory, the container has much fewer memory (and Go may or may not know this), Wasm thinks there's a lot of memory, and so does SQLite. When a Beyond improving what happens on But it is a fact that this is less optimized for low memory than a Cgo driver. Maybe it makes sense to push for a better blob API on the Cgo driver: mattn/go-sqlite3#239 (comment) |
One thing with the container; Go is not cgroup-aware, so it doesn't know that it doesn't have the hosts' full memory at its disposal when running in a container. There's an automemlimit package to help with that, or you can set |
If I made a PicoShare-specific fork of ncruces/go-sqlite3, would it be practical for me to cap the wasm memory to something smaller? Or is there any possibility of making this configurable or dynamic in the main implementation based on system RAM?
I don't think anyone's willing and able to do feature development on the mattn implementation. There hasn't been much activity there for a while.
Cool, I'd be interested in re-trying this if you end up implementing that. Your library seemed like a potential solution to this memory exhaustion issue I've had, but I don't need the fix urgently, as I'm able to use PicoShare fine for 99% of my needs without supporting 1 GB files. |
I already uncovered and fixed a bug in the allocator in fdfaaa8. I'll have to make a release shortly because this can cause data corruption (not silent data corruption, only in a OOM crash). You can reduce memory usage today with: sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().
WithMemoryLimitPages(512) But then you're likely to hit the bug above. Each Wasm page is 64K, so that's 32MB. I've got a 150MB Blob inserted with as little as 4MB memory per connection, though. The problem is that this limits SQLite to X MB, but SQLite doesn't know. Ideally, you say the limit is X, and then tell SQLite "make do with X-N." This needs some additional APIs. And if you use those APIs, maybe you don't need to limit Wasm memory after all, it can grow dynamically, and It's not out of the question that I have some leak, somewhere, because really this shouldn't require that much memory. I should also look over your PRAGMAs to see if you're using more memory than necessary. Cloud VMs also often add an issue where you think you're using temporary files, but these are actually RAM that counts against your allotment. Footnotes
|
Updated the above comment. Ideally I add these new APIs, and it's a matter of calling them, and getting to a reasonable low memory configuration (smaller page cache, and/or a soft memory limit). Then you can have a few connections (like 4) each keeping under 32Mb (with a soft limit of say 16 or 24Mb) with room to spare for Go and everything else. For a low memory environment that should be plenty, and you can do that with a few PRAGMAs. |
I tested with this version but see the same behavior, unfortunately.
Cool! If you do add these APIs, let me know, and I'm happy to test on PicoShare again. |
I haven't tested much, but these should be in v0.18.3. I would expect setting the I haven't got round to testing PicoShare proper, because my work machine is severely constrained, but I plan to. As I said, I was able to insert 100MB blobs into a database with |
I'm guessing, if the problem persists, it might be related to this comment in your source code: "We deliberately don't use a transaction here, as it bloats memory..." With the blob API the code becomes transactional, because even if you don't do a transaction, inserting the (empty) blob is a single statement (so transactional) and updating it is, again, a single statement. It might actually be cheaper to wrap them in a transaction. The transaction should spill to disk if it gets too big, though, and statement journals, which are stored as temporary files, so could end up in memory in a container, should not be involved. So I'm still not sure what's happening here. |
Okay, thanks! I tried again with 0.18.3 and unfortunately am seeing the same crash when I deploy to Fly / constrained Docker. https://github.com/mtlynch/picoshare/tree/120278e89829335e8b95de4c8eea4eac5ef665d6
Yeah, that makes me realize I should create a simpler test case. I'm thinking my next experiment will be:
|
This was my attempt: package main
import (
"database/sql"
"log"
"os"
"path/filepath"
"github.com/ncruces/go-sqlite3"
_ "github.com/ncruces/go-sqlite3/driver"
_ "github.com/ncruces/go-sqlite3/embed"
"github.com/ncruces/go-sqlite3/ext/blobio"
"github.com/tetratelabs/wazero"
)
func main() {
sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().WithMemoryLimitPages(64)
sqlite3.AutoExtension(blobio.Register)
var err error
db, err := sql.Open("sqlite3", "file:test.db?_pragma=journal_mode(WAL)")
if err != nil {
log.Panic(err)
}
defer db.Close()
_, err = db.Exec(`CREATE TABLE IF NOT EXISTS files (name, data)`)
if err != nil {
log.Panic(err)
}
for _, path := range os.Args[1:] {
func() {
file, err := os.Open(path)
if err != nil {
log.Panic(err)
}
defer file.Close()
info, err := file.Stat()
if err != nil {
log.Panic(err)
}
log.Println("size", info.Size())
r, err := db.Exec(`INSERT INTO files (name, data) VALUES (?, ?)`,
filepath.Base(path), sqlite3.ZeroBlob(info.Size()))
if err != nil {
log.Panic(err)
}
id, err := r.LastInsertId()
if err != nil {
log.Panic(err)
}
log.Println("id", id)
_, err = db.Exec(`SELECT writeblob('main', 'files', 'data', ?, 0, ?)`,
id, sqlite3.Pointer(file))
if err != nil {
log.Panic(err)
}
}()
}
} I'd also suggest testing with modes other than WAL. Not because using WAL is wrong, but because I might be leaking memory with my implementation of shared memory WAL. |
Cool, thanks! That's interesting. Using your simple app, I'm able to repro the crash even on a system with 24 GB of 32 GB of RAM free. Here's my branch: https://github.com/mtlynch/picoshare/tree/ncruces-simple-1 Here's the runner script: #!/usr/bin/env bash
set -eux
rm test.db* || true
readonly BINARY_DIR='bin'
mkdir -p "${BINARY_DIR}"
readonly BINARY_NAME="${BINARY_DIR}/test-ncruces"
go build \
-tags netgo \
-o "${BINARY_NAME}" \
main.go
TEST_FILE="$(mktemp)"
readonly TEST_FILE
dd if=/dev/urandom of="${TEST_FILE}" bs=1M count=512
du -h "${TEST_FILE}"
"./${BINARY_NAME}" "${TEST_FILE}" And here's the crash:
I tried removing the WAL option from your |
A bit of progress. It turns out that the OOM was from this line: sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().WithMemoryLimitPages(64) I bumped it to https://github.com/mtlynch/picoshare/tree/ncruces-simple-2 See: https://github.com/mtlynch/picoshare/blob/ncruces-simple-2/scripts/test-under-docker The strange thing is that any PRAGMA busy_timeout = 5000;
PRAGMA synchronous = NORMAL;
PRAGMA wal_autocheckpoint = 0; I've tried applying only one pragma at a time, and the test program always crashes, and I've tried with none of them enabled and the test program always succeeds with a ~1 GB file and 128M of RAM. I thought it was just luck, but I've been able to reproduce it 10x with the pragmas all disabled and with them individually enabled. Do you have any ideas why that could be happening? The other issue I ran into was that if I tried a file larger than 1 GB, it failed with |
Interestingly setting https://github.com/mtlynch/picoshare/blob/ncruces-simple-3/scripts/test
|
Yeah, I was about to say that. This is related to the shared memory index used by WAL. There might be a bug on my end (though looking at the code, it's not apparent, I'll look into it more). So, the way this works is:
One thing that I've noticed improves the situation is to wrap the Still, I saw the WAL indexes using 256Kb × [number of connections] for 100MB files, so it shouldn't be that bad. I really need to test your code under Linux to see if this is an OS issue. |
Right, so a transaction roughly cuts the size of the index needed in half. Still, the index for your sample 512MB file, is around 2MB. I'm sure there's some space lost to fragmentation, but it's 2MB. Never checkpointing the WAL though, makes this grow forever. So if you're doing
I had missed this. The maximum size of a BLOB is 1,000,000,000. This can be raised at compile time to 2GB (adding About APIs that are only available through the lower level API. If it's something you want to change for every new connection: sqlite3.AutoExtension(func(c *sqlite3.Conn) error {
// this runs whenever any database connection is opened
return nil
}) Or: db, err := driver.Open("file:test.db",
func(c *sqlite3.Conn) error {
// this runs whenever a connection to this database is opened
return nil
},
func(c *sqlite3.Conn) error {
// this runs before a connection to this database is closed
return nil
}
) If it's something you want to do in a specific connection, at a specific moment: conn, err := db.Conn(context.TODO())
if err != nil {
log.Fatal(err)
}
err = conn.Raw(func(driverConn any) error {
var c *sqlite3.Conn = driverConn.(driver.Conn).Raw()
// we have a low level Conn, lets Backup
return c.Backup("main", "backup.db")
})
if err != nil {
log.Fatal(err)
} |
So I finally ran this on Linux as well. The WAL index takes around 2MB per 1GB of WAL file. So with a 1GB WAL, SQLite seems to manage fine with just 8MB: 2MB for the default page cache, 2MB for the WAL index, 1MB for the write buffer I'm allocating, the other 3MB are "overhead" (stack space, global variables, fragmentation, etc); seams reasonable, can't really fault it. So the problem here seems to be:
When you put those 3 things together, yeah SQLite is going to consume unbounded memory (and my driver adds overhead by siloing each connection). 1 But Other than that I can only stress that if you disable auto checkpoints to make Litestream happy, you must run Litestream, and maybe tweak it to monitor the database more often: benbjohnson/litestream#100. Wrapping each Footnotes
|
For BLOBs larger than I'm not sure I'm up to the task though. |
Okay, thanks for digging so deeply into this! I'll experiment more with my sample program and PicoShare and see if I can get this working by breaking the BLOBs into chunks and using transactions.
That should be okay for PicoShare. Because it's not like an app where lots of users are interacting simultaneously and you can lose important data if you throw away the last minute of database writes. In the case of PicoShare, the worst that's probably going to happen is that you lose the last file you uploaded, which probably happened anyway if PicoShare crashed. Or maybe you lose a minute of analytics, but that's not a big deal either.
Okay, that's fine. I've already implemented this for the mattn driver, so it should be pretty straightforward to modify it for your driver. |
What I meant was, if auto checkpointing is disabled, with two transactions (one to reserve space for the BLOB, the other to write bytes to it) the WAL grows faster, and the index needs more memory. Because Litestream will not check at the "right" time, and checkpoint the WAL. With auto checkpoints, if the BLOB is big enough, SQLite decides the WAL grew too much and checkpoints between the transactions. I'll close this for now as the discussion long outlived the original question. I still fixed an important bug, improved wazero, did some BLOB improvements, and gained a contributor, so thanks! Feel free to open more issues. |
Thanks for your work on this library! It's exciting to see a library offer blob I/O support alongside the
database/sql
interface.I'm working on porting PicoShare to use this library, and I have some questions about the blobio extension.
The ideal semantics for me are if I can open an
io.ReadSeeker
to a blob in my database and then later use it how I'd like (similar to how I'd open a file handle on my local filesystem).Attempt 1: Saving a copy of blob outside the callback
I tried doing this:
Which results in this error:
I assume it's because I'm not allowed to use the blob outside of the callback.
Attempt 2: Only read the blob within the callback
If I perform the copy within the callback, things work correctly:
Full gist: https://gist.github.com/mtlynch/638e10675377515f15d9b086b929bb71
That said, only being able to interact with the blob from within the callback makes the semantics difficult for callers.
Context for what I'm doing
Ultimately, what I'm trying to do here is let HTTP clients write to and read files from the SQLite database by sending HTTP requests.
I currently do this with the mattn library and custom wrappers that expose an
io.WriteCloser
for writes and anio.ReadSeeker
for reads.The mattn library doesn't support blob I/O, so instead, my wrapper breaks up files into chunks of fixed size and then stores the chunks as distinct rows in a SQLite table.
Ultimately, what I'm hoping to do with your library is use the
io.Reader
interface from an HTTP multipart form POST and pass that toSELECT writeblob
in the blobio extension of this library. And then I'd like to pass anio.ReadSeeker
that I get fromSELECT openblob
and pass it to http.ServeContent for serving the data over HTTP.The writing part seems like it should work fine, but I'm having trouble with the reading part.
Potential path
My best idea is to write my own wrapper around the
SELECT openblob
call that exposes theio.ReadSeeker
interface but only callsopenblob
when aRead
call occurs, kind of like this:Does that make the most sense? Or is there a better way to achieve what I'm trying to do?
The text was updated successfully, but these errors were encountered: