Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector query consumes >115 GB of memory #5475

Closed
philrz opened this issue Nov 12, 2024 · 2 comments · Fixed by #5499
Closed

Vector query consumes >115 GB of memory #5475

philrz opened this issue Nov 12, 2024 · 2 comments · Fixed by #5499
Assignees
Labels
bug Something isn't working

Comments

@philrz
Copy link
Contributor

philrz commented Nov 12, 2024

Repro is with super commit fdc2852.

Test data is s3://brim-sampledata/mgbench/bench2.csup.

On my Macbook this query ran for 17+ minutes and Activity Monitor showed the super process consuming over 115 GB of memory (so, lots of swap) before it got shot by the OOM killer (I think my system ran out of free disk to hold swap by that point). At first I assumed this meant its aggregations might require spill-to-disk functionality that has not yet been added for vectors, but maybe there's something else to it, hence the issue.

$ super -version
Version: v1.18.0-144-gfdc2852b

$ time super dev vector query '
summarize
    transfer := sum(object_size)
    by log_time
| summarize
    transfer_avg := avg(transfer),
    transfer_max := max(transfer)
| put
    transfer_avg := transfer_avg / 125000000.0,
    transfer_max := transfer_max / 125000000.0
' bench2.csup

Killed: 9

real	17m3.831s
user	18m13.154s
sys	34m19.090s
@philrz philrz added the bug Something isn't working label Nov 12, 2024
@philrz
Copy link
Contributor Author

philrz commented Nov 21, 2024

It looks like conditions have changed for this one. The merge of #5484 happened to help a different query (mgbench bench2/q5, querying against Parquet) improve from "OOM killed" to "finishes ok" so I figured I'd give this one a retry as well. Per the repro below, starting with commit 87ab6b7 that's associated with the changes in #5484, instead of consuming >115 GB of memory+swap and getting OOM killed, I observed it now ran for about 3 minutes, consumed a bit above 20 GB of memory, then died with the panic shown.

$ super -version
Version: v1.18.0-160-g87ab6b72

$ time super dev vector query '
summarize
    transfer := sum(object_size)
    by log_time
| summarize
    transfer_avg := avg(transfer),
    transfer_max := max(transfer)
| put
    transfer_avg := transfer_avg / 125000000.0,
    transfer_max := transfer_max / 125000000.0
' bench2.csup
panic: vector kind mismatch after coerce (&vector.Int{Typ:(*super.TypeOfFloat64)(0x131fd540), Values:[]int64{-2346609157276448523}, Nulls:(*vector.Bool)(nil)} and &vector.Const{val:super.Value{typ:(*super.TypeOfFloat64)(0x131fd540), base:(*uint8)(0xc000785c11), len:0x8}, len:0x1, Nulls:(*vector.Bool)(nil)})

goroutine 1 [running]:
github.com/brimdata/super/runtime/vam/expr.(*Arith).eval(0xc0007aad80, {0xc0be7b2a60?, 0xc0be7b05a0?, 0xc000900008?})
	/Users/phil/work/super/runtime/vam/expr/arith.go:38 +0x6f9
github.com/brimdata/super/vector.Apply(0x30?, 0x121a3ab8?, {0xc0be7b2a60?, 0xc0be7b0530?, 0x1?})
	/Users/phil/work/super/vector/apply.go:17 +0xa3
github.com/brimdata/super/runtime/vam/expr.(*Arith).Eval(0xc0007aad80, {0x121a3ab8, 0xc0be7af0e0})
	/Users/phil/work/super/runtime/vam/expr/arith.go:26 +0xe6
github.com/brimdata/super/runtime/vam/expr.(*recordExpr).Eval(0xc0004ba7e0, {0x121a3ab8, 0xc0be7af0e0})
	/Users/phil/work/super/runtime/vam/expr/recordexpr.go:38 +0x134
github.com/brimdata/super/runtime/vam/expr.(*Putter).eval(0xc0007a4e28, {0xc0be7b0580?, 0x121a3c08?, 0x2?})
	/Users/phil/work/super/runtime/vam/expr/putter.go:29 +0xa4
github.com/brimdata/super/vector.Apply(0x0?, 0xc0be7b0570?, {0xc0be7b0580?, 0x10582b49?, 0x10?})
	/Users/phil/work/super/vector/apply.go:17 +0xa3
github.com/brimdata/super/runtime/vam/expr.(*Putter).Eval(0xc0004e51e0?, {0x121a3ab8, 0xc0be7af0e0})
	/Users/phil/work/super/runtime/vam/expr/putter.go:21 +0x75
github.com/brimdata/super/runtime/vam/op.(*Yield).Pull(0xc0007aaf60, 0x0)
	/Users/phil/work/super/runtime/vam/op/yield.go:33 +0x112
github.com/brimdata/super/runtime/vam.(*Materializer).Pull(0x5a227058?, 0xf8?)
	/Users/phil/work/super/runtime/vam/materialize.go:25 +0x27
github.com/brimdata/super/zbuf.(*pullerReader).Read(0xc00079d400)
	/Users/phil/work/super/zbuf/batch.go:193 +0x85
github.com/brimdata/super/zio.CopyWithContext({0x121a36c8, 0x131fd540}, {0x5a227058, 0xc0007a4e40}, {0x12189be0, 0xc00079d400})
	/Users/phil/work/super/zio/zio.go:154 +0x5a
github.com/brimdata/super/zio.Copy(...)
	/Users/phil/work/super/zio/zio.go:146
github.com/brimdata/super/cmd/super/dev/vector/query.(*Command).Run(0x12189bc0?, {0xc000144160, 0x2, 0xc000144160?})
	/Users/phil/work/super/cmd/super/dev/vector/query/command.go:80 +0x5f4
github.com/brimdata/super/pkg/charm.path.run({0xc000638a20, 0x4, 0x4}, {0xc000144160, 0x2, 0x0?})
	/Users/phil/work/super/pkg/charm/path.go:11 +0x7f
github.com/brimdata/super/pkg/charm.(*Spec).Exec(0x131c3140, {0xc000144130, 0x5, 0x5})
	/Users/phil/work/super/pkg/charm/charm.go:71 +0x105
main.main()
	/Users/phil/work/super/cmd/super/main.go:41 +0x5b

real	2m57.710s
user	1m41.122s
sys	1m38.651s

@philrz
Copy link
Contributor Author

philrz commented Nov 25, 2024

Verified in super commit 7c930f2.

The query now runs successfully without panic.

$ super -version
Version: v1.18.0-169-g7c930f2a

$ time super dev vector query '
summarize
    transfer := sum(object_size)
    by log_time
| summarize
    transfer_avg := avg(transfer),
    transfer_max := max(transfer)
| put
    transfer_avg := transfer_avg / 125000000.0,
    transfer_max := transfer_max / 125000000.0
' bench2.csup

{transfer_avg:0.0046296999419207785,transfer_max:295.028835936}

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants