Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use HashStringAllocator for raw_vector rows in HashLookup #10349

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

czentgr
Copy link
Collaborator

@czentgr czentgr commented Jun 28, 2024

The rows in the HashLookup are stored as raw_vector. The raw_vector uses a malloc type allocation to acquire aligned memory. It was found this contributes to large unaccounted memory when mainly using the HashProbe and other operators.

This change allows the usage of a custom allocator for the raw_vector. By default it keeps the current way of allocating and deallocating.

For raw_vectors representing rows in the HashLookup the allocator used is the HashStringAllocator which allows for the accounting of the memory.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024
Copy link

netlify bot commented Jun 28, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 2ece662
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/671ac4cfe8b82a00082ed753

@@ -74,7 +77,8 @@ struct HashLookup {
/// Input to groupProbe and joinProbe APIs.

/// Set of row numbers of row to probe.
raw_vector<vector_size_t> rows;
raw_vector<vector_size_t, AlignedStlAllocator<vector_size_t, simd::kPadding>>
rows;

/// Hashes or value IDs for rows in 'rows'. Not aligned with 'rows'. Index is
/// the row number.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are bunch more raw_vector below this in HashLookup (hashes, hits, newGroups, normalizedKeys). All these are also proportional to the number of input rows in addInput(). We should use the HashStringAllocator for them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I started with one. More changes to specify the templated type in various places would be needed. I had started on this after pushing this PR. Will continue making the changes and try them out.

velox/common/base/RawVector.h Outdated Show resolved Hide resolved
velox/common/base/RawVector.h Show resolved Hide resolved
@@ -118,6 +132,7 @@ class raw_vector {
}
reserve(size);
size_ = size;
allocatedSize_ = size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just store the bytes in allocateData in this field

velox/common/base/RawVector.h Outdated Show resolved Hide resolved
@@ -55,8 +55,11 @@ struct TableInsertPartitionInfo {

/// Contains input and output parameters for groupProbe and joinProbe APIs.
struct HashLookup {
explicit HashLookup(const std::vector<std::unique_ptr<VectorHasher>>& h)
: hashers(h) {}
explicit HashLookup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for explicit anymore

@czentgr czentgr force-pushed the cz_fix_hash_memory_accounting branch 2 times, most recently from b413411 to 33a6c05 Compare July 15, 2024 17:10
@@ -152,6 +154,9 @@ class raw_vector {
}

private:
// Constructs a copy of 'other'. See operator=. 'data_' must be copied.
raw_vector(const raw_vector<T>& other) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use raw_vector(const raw_vector<T>& other) = deleted
Also delete the copy assignment void operator=(const raw_vector<T>& other) = deleted

@@ -177,7 +183,8 @@ class raw_vector {

void freeData(T* data) {
if (data_) {
::free(addBytes(data, -simd::kPadding));
auto bytes = paddedSize(sizeof(T) * allocatedSize_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocatedSize_ is already padded byte size, just pass it directly to deallocate

Copy link
Collaborator Author

@czentgr czentgr Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was actually a bug. But it revealed a problem with the interfaces of the two allocators used.

The original implementation used byte size. But the AlignedStlAllocator (HashStringAllocator) wants as input the number of elements to do its own padding/alignment.
The problem is that the result isn't a memory block that neatly multiples of sizeof(T) and won't necessarily align with the xsimd alignment that is used for the raw vectors.

So I tried to do something like this to get a number of element count (and equivalent changes in the StlAlignedAllocator to change from bytes to number of elements).

  T* allocateData(int32_t size, int32_t& capacity) {
    auto bytes = paddedSize(sizeof(T) * size);
    auto numElementsOfT = std::ceil(bytes / sizeof(T));
    // The allocate function takes in number of elements for the AlignedStlAllocator
    // and not the number of bytes.
    auto ptr = allocator_.allocate(numElementsOfT);
    allocatedSize_ = numElementsOfT;
    ...
}

But I'm seeing SEGVs in the fuzzers because the raw vector for the hashes vector during simd execution appears to be reading the incorrect number of rows. My guess is that the memory layout due to the incorrect simd alignment of the rows vector causes this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a bug in StdAlignedAllocator::allocate:

    return reinterpret_cast<T*>(aligned_alloc(kAlignment, n * sizeof(T)));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I did fix that when I noticed it is the number of elements.
There is some kind of memory corruption with the memory from the HSA. I can see that the rows vector would contain sequential numbers (the row numbers) but I can see it being corrupted when it is freed or when it is occasionally read. Any contents in the memory region that was allocated from the HSA. I've seen rows values being corrupted, and in some cases it is the delta that is used to calculate the actual pointer that was allocated, sometimes this corrupted value is accessed to find the hash in the probe and fails, the HSA header could also be corrupted, etc. And it contains non-rows content - clearly non-numeric values (some values seem to be ascii strings - e.g. E_ARRAY and others).
I'm trying to figure out what is happening. The memory block allocated previously outside the HSA didn't have this kind of problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the raw_vector outside the life cycle of HSA?

Copy link
Collaborator Author

@czentgr czentgr Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Say the RowNumber operator that is used for one of the failing fuzzers. The lookup_ is destructed first and the table_ with the HSA after that. So this should be ok. If HSA was invalid we would get SEGV on access of the allocate/deallocate during deallocation of the raw_vector in the HashLookup.
The issue doesn't always happen.

Edit: seems to be related to execution of restoreNextSpillPartition. Every time it's a normal RowNumber operator allocate/deallocate it's fine. But when it comes to the above function where it also prepares for the probe it then fails in the actual group probe.

Copy link
Collaborator Author

@czentgr czentgr Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found the root cause. The issue is that we pass the HSA as a regular point while it is a shared_ptr in the RowContainer that has only a single reference. Later when restoring the spill partition data, the hash table is cleared, which also clears the RowContainer (RowContainer::clear()) and if the the HSA is not shared then it also clears out the HSA and with it the raw_vector data in the HashLookup leading to subsequent lookup problems.

I see two possibilities:

  1. Let each opertor using the HashLookup use its operator pool to create a new HSA for it. This affects the HashProbe, RowNumber, TopNRowNumber operators. And also have the HashTable use its own memory pool creating the HashLookup object whenevr needed. So they are separate from each other.
  2. Keep a shared_ptr reference to the RowContainer HSA in the HashLookup to make sure the HSA is shared (use count will be 2) is, therefore, not cleared.

Is there a preference? Option 2 would likely involve much less allocation/deallocation of a new HSA if the HashLookup is frequently destroyed/created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 2 is probably not possible because we must clear out and release the memory when spilling is triggered. If we want to continue to use the same HSA as row container, all these raw_vectors should also be spilled and unspilled at the same time as the row container. @xiaoxmeng Do you think this is doable?

Option 1 is possible but might have performance issue. Maybe we can have a driver level HSA to be used here?

void operator=(raw_vector<T>&& other) noexcept {
data_ = other.data_;
size_ = other.size_;
capacity_ = other.capacity_;
allocator_ = std::move(other.allocator_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just do void operator=(raw_vector<T>&& other) noexcept = default

@@ -74,7 +77,8 @@ struct HashLookup {
/// Input to groupProbe and joinProbe APIs.

/// Set of row numbers of row to probe.
raw_vector<vector_size_t> rows;
raw_vector<vector_size_t, AlignedStlAllocator<vector_size_t, simd::kPadding>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give this thing an alias since you are going to create a lot of these. Something like HsaRawVector

Copy link

stale bot commented Oct 24, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Oct 24, 2024
@czentgr czentgr force-pushed the cz_fix_hash_memory_accounting branch from 33a6c05 to b53bcd6 Compare October 24, 2024 22:00
@czentgr czentgr removed the stale label Oct 24, 2024
@czentgr czentgr changed the title Use HashStringAllocator for raw_vector rows in HashLookup [WIP] Use HashStringAllocator for raw_vector rows in HashLookup Oct 24, 2024
The rows in the HashLookup are stored as raw_vector.
The raw_vector uses a malloc type allocation to acquire aligned
memory. It was found this contributes to large unaccounted
memory when  mainly using the HashProbe and other operators.

This change allows the usage of a custom allocator for the
raw_vector. By default it keeps the current way of allocating
and deallocating.

For raw_vectors representing rows in the HashLookup the
allocator used is the HashStringAllocator which allows for
the accounting of the memory.

Co-authored-by: Jimmy Lu <[email protected]>

Resolves: facebookincubator#10200
@czentgr czentgr force-pushed the cz_fix_hash_memory_accounting branch from b53bcd6 to 2ece662 Compare October 24, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants