Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PrestoBatchSerializer should not preserve Dictionary encoding if it m…
…akes the data larger (#8688) Summary: Pull Request resolved: #8688 This change adds some basic heuristics which serializeDictionaryVector can use to flatten a Vector as part of serializing it rather than preserving the Dictionary encoding. The checks are: * if the size of the Vector type is smaller than or equal to int32_t (the indices into the dictionary) * if the Vector type is fixed width and we determine that the size of the indices + the size of the alphabet is larger than the size of the original data * regardless of the Vector type, if the alphabet contains unique values This helps to ensure the preserving encodings during serialization won't actually make the serialized data larger. Reviewed By: bikramSingh91 Differential Revision: D53484809 fbshipit-source-id: c7954b827a0a8e946a67d53e5b1195184c9e8d3a
- Loading branch information