You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A thing to try: assign a unique page to each slot, even if not used. What we really want is a masked scatter but padding to not alias the same empty page is much easier on the memory system
Very strange behavior where if a batch is partially filled, it generates 10x slower than if it were completely filled.
I suspect it's because all the filler batches are using page 0 in the cache, causing contention / clobbering.
To confirm this, test by allocating unique pages for each filler page.
To fix this, either allocate pages for fillers in the batch, or treat page_index==0 as a special case in the prefill / decode functions.
See logs below:
The text was updated successfully, but these errors were encountered: