Handle buffered dwrf write exception to avoid server crash #9784

xiaoxmeng · 2024-05-13T04:24:18Z

Catch exception during BufferedWriter processing to avoid server crash as ~BufferedWriter
expect either the object is aborted or all the data mutations have been flushed.

Summary: Pull Request resolved: facebookincubator#9386 CAST expression used to handle INTERVAL DAY TO SECOND values as BIGINT. This allowed invalid casts (e.g. from varchar, to double, etc.) and produced incorrect results when casting to varchar. Casting 1 second interval used to return '1000' instead of '0 00:00:01.000'. Fixes facebookincubator#9384 Reviewed By: xiaoxmeng Differential Revision: D55796600 fbshipit-source-id: 3d51a8eb18b434315bf9285f58b8f2cdbedca63d

…lid (facebookincubator#8977) Summary: `buildSelectedType()` takes a constant typeWithId type tree and applies the selector on it to return a selected new type tree. However, it modifies the input typeWithId unexpectedly because it duplicates the internal type nodes, but not the leaves(primitive types). The leaf nodes were directly moved to the result instead of copied, thus making in the original input typeWithId incomplete. ``` std::shared_ptr<const TypeWithId> typeutils::buildSelectedType( const std::shared_ptr<const TypeWithId>& typeWithId, const std::function<bool(size_t)>& selector) { return visit(typeWithId, selector); } ``` ``` std::shared_ptr<const TypeWithId> visit( const std::shared_ptr<const TypeWithId>& typeWithId, const std::function<bool(size_t)>& selector) { if (typeWithId->type()->isPrimitiveType()) { return typeWithId; // directly returned } if (typeWithId->type()->isRow()) { std::vector<std::string> names; std::vector<std::shared_ptr<const TypeWithId>> typesWithId; // constructing a new children vector std::vector<std::shared_ptr<const Type>> types; auto& row = typeWithId->type()->asRow(); for (auto i = 0; i < typeWithId->size(); ++i) { auto& child = typeWithId->childAt(i); if (selector(child->id())) { names.push_back(row.nameOf(i)); std::shared_ptr<const TypeWithId> twid; twid = visit(child, selector); // twid was pointing to the original typeWithId leaf typesWithId.push_back(twid); // twid pushed to the new vector typesWithId types.push_back(twid->type()); } } VELOX_USER_CHECK( !types.empty(), "selected nothing from row: " + row.toString()); return std::make_shared<TypeWithId>(. // Creating a new type node ROW(std::move(names), std::move(types)), std::move(typesWithId), // But with the children vector that contains the original leaves. And the original ones are now nullptr. typeWithId->id(), typeWithId->maxId(), typeWithId->column()); ``` This PR fixes the problem by making a copy of the leaf nodes as well. After this change, the original input typeWithId would still be valid. Pull Request resolved: facebookincubator#8977 Reviewed By: pedroerp Differential Revision: D55756100 Pulled By: Yuhta fbshipit-source-id: 135ac4432adef69aa240f5c0b6e550683981e84a

Summary: Fix reduceAgg bencnmark test setup by calling OperatorTestBase::SetUpTestCase in benchmark main instead of its constructor. SetUpTestCase is supposed to call once before any unit test cases run (like OperatorTestBase constructor). Correspondingly move SetUpTestCase/TearDownTestCase to public section to use for non-google unit test use case such as reduceAgg benchmark which use it for benchmark testing setup Pull Request resolved: facebookincubator#9394 Reviewed By: mbasmanova Differential Revision: D55820274 Pulled By: xiaoxmeng fbshipit-source-id: 60e72837e6c867367da72c7c90feb664ccdef49a

…#9364) Summary: Pull Request resolved: facebookincubator#9364 When evaluating shared subexpressions with TRY, we only copy the values that are valid (did not encounter exceptions) into the sharedSubexprValues. This means that the size of sharedSubexprValues may not have values for all the selected rows (in particular it may have a smaller length than rows). We need to handle this when copying values from sharedSubexprValues into the result Vector, we should only copy values that did not encounter exceptions. TRY will handle this upstream by appending nulls for the exceptions. I encountered this in fuzzing, the map function produces a result Vector that is only large enough to hold values that did not encounter exceptions and this causes an error in context.moveOrCopyResult because it was trying to copy more elements than exist. Reviewed By: Yuhta Differential Revision: D55719677 fbshipit-source-id: bfc0cc67f86902d742fcf5fd34798f2ba1eaf342

…cebookincubator#9396) Summary: Pull Request resolved: facebookincubator#9396 facebookincubator#9067 introduced an initialized flag that we include alongside the null flag for aggregations in the rows in RowContainer. Global aggregations do not use a RowContainer to create rows, rather a single row is constructed manually inside GroupingSets. When that PR was landed this logic was not properly updated, we did not allocate enough space in the row for the additional bits which could lead to null/initialized flags overwriting the aggregate values when there were a lot of them. This change fixes it so that we allocate the correct amount of space for flags for global aggregations as an immediate fix. Longer term it would be better to centralize the logic for creating rows in RowContainer (I'd like to get us back to a working state before attempting such a refactor). Reviewed By: Yuhta Differential Revision: D55823098 fbshipit-source-id: 9a15f257669273c64dc0361bf735c4472e45957d

Summary: Add early flushed bytes in writer that is due to memory reclaiming. Pull Request resolved: facebookincubator#9378 Reviewed By: xiaoxmeng Differential Revision: D55786890 Pulled By: tanjialiang fbshipit-source-id: f81dd83390032bd1b6e944bfee9ee56c0072a2ea

Summary: This PR is addressing this issue facebookincubator#9135 Pull Request resolved: facebookincubator#9228 Reviewed By: Yuhta Differential Revision: D55609322 Pulled By: xiaoxmeng fbshipit-source-id: c1921fd5551c553384fde4ffddf80ad659471094

…incubator#9377) Summary: Only collect task pause time if it is not zero Include reclaim duration in task reclamation log Pull Request resolved: facebookincubator#9377 Reviewed By: tanjialiang Differential Revision: D55786597 Pulled By: xiaoxmeng fbshipit-source-id: e35aeff84a6abee602e8ccdb5546310e2f1f762f

facebookincubator#9308) Summary: Refactor the greatest/least functions using simple function API. Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391 Fixes facebookincubator#3728 Pull Request resolved: facebookincubator#9308 Reviewed By: xiaoxmeng Differential Revision: D55793910 Pulled By: mbasmanova fbshipit-source-id: c389bad91197f00ced549d816a15efab5a2dd910

…okincubator#9407) Summary: Pull Request resolved: facebookincubator#9407 The approx_distinct result verifier for window fuzzer is giving false positive signals. It need to be fixed as described in facebookincubator#9347. Disable this result verifier until it is fixed. Reviewed By: kgpai Differential Revision: D55879765 fbshipit-source-id: 23d3c30cc4660ccb111ef641ed22ef111f79e6a4

…9408) Summary: When memory arbitrator aborts a query, it waits for all the driver threads of the query to finish, and then abort the query operators by freeing their memory resources. The query operator or driver thread which is currently under memory arbitration is under suspended driver state which is not counted as running. We shouldn't abort such operator as it might be under memory processing within a component such as row container in hash build, and the component might be reset by operator abort. Even though we won't continue the query procesing but the excception throw might need to access the component state on the exception return path. This PR fixes this problem by avoiding abort an operator which driver is running and under suspension state. Pull Request resolved: facebookincubator#9408 Reviewed By: bikramSingh91, oerling Differential Revision: D55883111 Pulled By: xiaoxmeng fbshipit-source-id: c5fd68401bf82be92db2b9f34b8e8a7d1a7f13c2

Summary: 1. Fix hash probe spill hang for right semi join types For right semi join (project and filter), it produce output from the probed rows in hash table after processing all the probe inputs. A probe input vector (set in Operator::input_) might be processed in multiple output calls until all the input rows have been processed. The probe side spill needs to spill output from the current probe input vector before clear the hash table and the current logic wait until the output is null. This is not correct for right semi joins as it always return null, and instead we shall check on if input_ has been reset. Add unit test to verify the fix plus join fuzzer run. 2. Disable hash probe spill if dynamic filters have been generated Hash probe might generate dynamic filters based on the hash join key ranges from build side and pushdown to upstream operators. If this has been triggered, then we can't spill from hash probe side as it might produce the incorrect result as detected by join fuzzer test. The fix is to disable hash probe spill if dynamic filters have been generated. Add unit test to verif the fix plus join fuzzer run. Pull Request resolved: facebookincubator#9400 Reviewed By: kagamiori, oerling Differential Revision: D55828946 Pulled By: xiaoxmeng fbshipit-source-id: ea4cd62abf0f9d169ad61cd5718bae4c44429e68

…r#9373) Summary: Pull Request resolved: facebookincubator#9373 ArgumentTypeFuzzer serves two purposes. 1. Given a generic function signature, generate a random valid return type. 2. Given a generic function signature and a concrete return type, generate a list of valid argument types. Consider a signature of the map_keys function: ``` map(K, V) -> array(K) ``` This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’. Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it. To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method. ``` ArgumentTypeFuzzer fuzzer(signature, rng) auto returnType = fuzzer.fuzzReturnType() ``` To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method. ``` ArgumentTypeFuzzer fuzzer(signature, returnType, rng) if (fuzzer.fuzzArgumentTypes()) { auto argumentTypes = fuzzer.argumentTypes(); } ``` Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature. This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types. Consider a signature of least function: ``` (decimal(p, s),...) -> decimal(p, s) ``` This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7). Consider slightly different function: between ``` (decimal(p, s), decimal(p, s)) -> boolean ``` This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. Let’s also look at a the signature of the ‘floor’ function: ``` (decimal(p, s)) -> decimal(rp, 0) ``` This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint: ``` rp = min(38, p - s + min(s, 1)) ``` The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables. It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables. To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range. This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method. Reviewed By: bikramSingh91 Differential Revision: D55772808 fbshipit-source-id: 708f202fc7270aaeaa59e28175aea5147b4a7981

…facebookincubator#9286) Summary: When the result scale of decimal multiply exceeds the precision, exception occurs during the decimal type creation. Pull Request resolved: facebookincubator#9286 Reviewed By: xiaoxmeng Differential Revision: D55915055 Pulled By: mbasmanova fbshipit-source-id: b7bd9eea73cdb6a863d955da85d4b10fd9668be4

Summary: Fixes accounting of kMetricArbitratorLocalArbitrationCountwhich was previously sometimes incremented for global arbitration. Also adds additional operator level metrics for keeping track of global and local arbitration attempts initiated by them. Pull Request resolved: facebookincubator#9398 Reviewed By: xiaoxmeng Differential Revision: D55826917 Pulled By: bikramSingh91 fbshipit-source-id: 4f7bd112ea1bc6ba07f7ce5501a0143f4fd7631f

Summary: The output size was incorrectly estimated in certain cases due to a bug. This was causing flush to happen either too early or too late depending on a query shape. Pull Request resolved: facebookincubator#9311 Reviewed By: xiaoxmeng Differential Revision: D55804404 Pulled By: arhimondr fbshipit-source-id: 6ca83c74e206a06090327855d544eb231cdbba1f

…tor#9399) Summary: Pull Request resolved: facebookincubator#9399 In Presto, the default standard error for `approx_set` (and `empty_approx_set`) is [0.01625](https://prestodb.io/docs/current/functions/hyperloglog.html#approx_set), which is different from `approx_distinct` ([0.023](https://prestodb.io/docs/current/functions/aggregate.html#approx_distinct)). In Velox we are using 0.023 for both, this causes problem (in addition to the precision loss) that when `empty_approx_set` is constant folded by the coordinator, it has a different standard error and failing the sanity check. Fix this by aligning the default error value with Presto. Reviewed By: mbasmanova Differential Revision: D55814871 fbshipit-source-id: 4ecb650d02ba004fe5b1a6369d3d44464f3427ec

…or#9140) Summary: This per test sanity check allows test to reveal failures on test level instead of on the entire suite level. This will greatly reduce the time engineers spent in locating failed tests due to memory leak. Pull Request resolved: facebookincubator#9140 Reviewed By: xiaoxmeng Differential Revision: D55043514 Pulled By: tanjialiang fbshipit-source-id: d6c37bc56fe3c802f49ad6a36ae0daa2c708f4b4

…bator#9416) Summary: Add velox.memory_pool_initial_capacity_bytes as histogram to track the query memory pool's initial capacity distribution. This indicates if the overall memory arbitration is under high query memory usage pressure. If a small query starts with sufficient amount of memory, then it can run through completion without the interference of the slow memory arbitration by some large query. Add elox.memory_pool_capacity_growth_count to track the number of query memory pool capacity growth attempts through memory arbitrations. The less number of growth, the less interference with the memory arbitration by the other queries. Both metrics can be used to evaluate the effectiveness of the followup optimizations: reserved query capacity and memory capacity growth control within memory arbitrator Pull Request resolved: facebookincubator#9416 Reviewed By: bikramSingh91, oerling Differential Revision: D55925034 Pulled By: xiaoxmeng fbshipit-source-id: fe289e43f1eb99e4b798d32cdb82103dd4d8d04b

Summary: Pull Request resolved: facebookincubator#9410 Currently the key type of `MAP` is specified as `knownTypeVariable` which refuses to bind `UNKNOWN` type, while in Presto `MAP(UNKNOWN, UNKNOWN)` is valid. Reviewed By: kagamiori Differential Revision: D55902535 fbshipit-source-id: 8c57736859d9f726b1cb7aa43cec74d60990c630

…incubator#8648) Summary: Currently we only have a warning log for double free in mmap allocator and we shall also add a metric for production alert. Pull Request resolved: facebookincubator#8648 Reviewed By: bikramSingh91 Differential Revision: D53341067 Pulled By: xiaoxmeng fbshipit-source-id: 3d0979a1621ebae1180b3d59cd98345fd0c4dff8

…#9387) Summary: This PR makes a change to remove the globally applied `-Wno-return-type`, fixes some violations. This PR originally facebookincubator#4618 disabled it. In the case of a code bug, it may become difficult to diagnose the actual cause. An example of this is ``` #include <stdio.h> bool test1() { printf("Called inner\n"); } void test() { test1(); } int main(int argc, char* argv[]) { printf("Called test\n"); test(); printf("Finished test\n"); return 0; } ``` which crashes when compiled with -O2 due to bad codegen. Pull Request resolved: facebookincubator#9387 Reviewed By: pedroerp Differential Revision: D55933467 Pulled By: mbasmanova fbshipit-source-id: da748fc5df19adb3d6e295973cc12e75ac43f2c1

Summary: Pull Request resolved: facebookincubator#9343 Reviewed By: xiaoxmeng, amitkdutta Differential Revision: D55954222 Pulled By: kgpai fbshipit-source-id: 0cafeb35f98c642e2845d114700e5214dd321327

Summary: Pull Request resolved: facebookincubator#9421 For the queries that read HLL digests from table and call `merge` on them, we used to create empty accumulator and merge it with the serialized digest, and then serialize the merged accumulator again, resulting in wasted CPU. Fix this by passing the serialized digests directly in case of abandon partial aggregation. Reviewed By: mbasmanova Differential Revision: D55935663 fbshipit-source-id: 35ace659fde3490040c52238fad41895e9f759b1

Summary: Pull Request resolved: facebookincubator#9423 Logically, HLL digest represents an array of 2^n numbers. Merging two such arrays consists of computing element-wise max of the numbers. The sizes of the arrays are always the same. Example, ``` HLL1: [1, 2, 3, 4, 5] HLL2: [5, 4, 3, 2, 1] Combined HLL: [5,4, 3, 4, 5] ``` This should be a fairly quick operation that easily lends itself to vectorization. But there is a catch. The numbers stored in HLL digest are 8-bit integers that represent the number of leading zeros in a 64-bit hash. These numbers are from [0, 63] range. Hence, require at most 6 bits. The size of the HLL array is a power of 2 (2^n), where n is a function of max standard error specified in approx_distinct. Larger values of 'n' allow for higher accuracy at the expense of using more memory. In production workloads, we often see n = 16. Storing 2^16 8-bit numbers requires 65KB (each number uses 1 byte). However, Presto uses a compact representation of the HLL digest that requires about half the memory. Compact representation includes: * One 8-bit baseline number: minimum value of all the numbers in the array * 2^n 4-bit deltas from baseline capped at 15 (maximum value representable in 4 bits): delta = min(value - baseline, 15). * A small list of overflow values. * A list of indices where overflow has occurred (value > baseline + 15) * A matching list of overflows: overflow = value - baseline - 15. For example, given HLL: [10, 2, 3, 18, 5], its compact representation is * baseline: 2 * deltas: [8, 0, 1, 15, 3] * overflow indices: [3] * overflow values: [1] There is only one value that has an overflow: 18 = 2 (baseline) + 15 (max delta) + 1 (overflow). Hence, overflow indices and values arrays have just one entry each. In practice, for HLL of size 2^16, we see ~20 overflows. Compact representation helps save memory and network bandwidth (shuffle), but requires a more sophisticated and therefore expensive processing. Merging compactly represented HLLs is a lot slower. It is possible to speed up merging of compact HLLs using SIMD instructions. The implementation here is 2-3x faster than the original scalar implementation. There are a few challenges in SIMD-zing HLL merge. First, there are no instructions to operate on 4-bit integers. The smallest supported integer is 8-bit. There is also no instruction to load 4-bit integers into SIMD registers. The approach implemented here is to process deltas in even slots separately from deltas in odd slots. We process deltas in batches of 64. Each batch is located in a 32 byte-buffer, each byte storing 2 values. Overflows are processed separately using scalar code path. Add microbenchmark to measure the speed of DenseHll::mergeWith(serialized) for 3 different values of the hash bits: 11 (default), 12 and 16. The benchmark run on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz shows speed up of 4x, 2.5x, 25x respectively. However, end-to-end measurements show much smaller gains of 2-3x. Before: ``` ============================================================================ [...]n/hyperloglog/benchmarks/DenseHll.cpp relative time/iter iters/s ============================================================================ mergeSerialized11 25.10us 39.84K mergeSerialized12 54.29us 18.42K mergeSerialized16 569.50us 1.76K ``` After: ``` ============================================================================ [...]n/hyperloglog/benchmarks/DenseHll.cpp relative time/iter iters/s ============================================================================ mergeSerialized11 6.33us 158.02K mergeSerialized12 20.04us 49.91K mergeSerialized16 22.37us 44.70K ``` Reviewed By: Yuhta Differential Revision: D55936471 fbshipit-source-id: 61f44e5e0573907ba9bf10ae5aca00f70363ec8e

…ncubator#9433) Summary: Saving ccache is resulting in some errors finalizing the upload which is preventing scheduled jobs from running. Removing this step temporarily till its resolved. Pull Request resolved: facebookincubator#9433 Reviewed By: kgpai Differential Revision: D55987674 Pulled By: bikramSingh91 fbshipit-source-id: bc5655df1ef96dbd710871b0f1ed30b06cf41e4a

Summary: To prepare for the upcoming equality delete file read, we need to refactor the SplitReader a bit. Pull Request resolved: facebookincubator#8995 Reviewed By: xiaoxmeng Differential Revision: D55072998 Pulled By: Yuhta fbshipit-source-id: 662459041b947d51ffaa98b57a50e4ebdd5b36e3

…incubator#9434) Summary: Aggregation test is flaky on circle ci which has seen unexpected spill stats with zero partial input rows: facebookincubator#9292 Since we can't reproduce this offline, add one more log to print out the number of input rows for final aggregation. If there is non-zero final aggregation input rows, then we need to look into if task stats collection is not reliable. Eventually, we shall check spill stats based on whether spill injections have been triggered or not which is more reliable. But let's first figure out if anything else in the test is wrong first. Pull Request resolved: facebookincubator#9434 Reviewed By: mbasmanova Differential Revision: D55987679 Pulled By: xiaoxmeng fbshipit-source-id: 8789f077adc877d1762f20b391c77ab87f4e6a61

Summary: Allow specifying the file length via `FileOptions`. When opening a file for reading, this helps avoid a call to remote storage to fetch file metadata. This is one of the changes needed to support facebookincubator#9314 Pull Request resolved: facebookincubator#9313 Reviewed By: Yuhta Differential Revision: D55982134 Pulled By: mbasmanova fbshipit-source-id: b358622a974ed05eeebd0d9e63a29e8ec933c40b

…tor#9409) Summary: Pull Request resolved: facebookincubator#9409 The custom result verification for approx_distinct in window operations requires a different handling from aggregation. The reason is that the error bound for approx_distinc only applies to independent input sets, while input sets for rows in the same partition in a window operation are correlated. Therefore, this diff changes ApproxDistinctResultVerifier to only take the max error in each partition when verifying the error bound for results of window operations. This diff fixes facebookincubator#9347. Reviewed By: Yuhta Differential Revision: D55898634 fbshipit-source-id: 847a70b1bfc3870bedb768f8f86711073b32e0d2

Summary: Pull Request resolved: facebookincubator#9724 Reviewed By: zacw7 Differential Revision: D57012132 Pulled By: xiaoxmeng fbshipit-source-id: 390ec69d19206a7dac4aabc915db4b3b140076f4

…rSpec.h (facebookincubator#9720) Summary: Pull Request resolved: facebookincubator#9720 C++20 has [eliminated](https://en.cppreference.com/w/cpp/types/result_of) `result_of` in favour of `invoke_result`. It's mysterious that this code even still works, but, nevertheless, I'm fixing it. Differential Revision: D56987447 fbshipit-source-id: 0cf99659c63739911b503a3c0d97be641f98ace9

…#9737) Summary: Avoid unnecessary reclaim on non-reclaimable query which might spent time on waiting for query task to pause. Also allows arbitration to reclaim reserved capacity from the request pool itself. Pull Request resolved: facebookincubator#9737 Reviewed By: tanjialiang Differential Revision: D57055391 Pulled By: xiaoxmeng fbshipit-source-id: 34fd61a343e445c2e5a4e228f3ce54d630c65f24

Summary: Tested on MacOS 14.4.1 w/ M1 chip. * Per facebookincubator#1446 (comment), `CPU_TARGET="arm64"` is not needed any more. ``` ++ arch + '[' arm64 == arm64 ']' ``` * Add `INSTALL_PREFIX` to address dependency installation permission issues in facebookincubator#2016 CC: majetideepak Pull Request resolved: facebookincubator#9717 Reviewed By: xiaoxmeng Differential Revision: D57072389 Pulled By: bikramSingh91 fbshipit-source-id: 4a48333b5034afc4db8ffac411769323b4f77a82

Summary: When cmake 3.16 is used, build velox fails with the below error. ``` CMake Error at CMake/resolve_dependency_modules/stemmer.cmake:48 (add_library): add_library cannot create ALIAS target "stemmer::stemmer" because target "stemmer" is imported but not globally visible. Call Stack (most recent call first): CMake/ResolveDependency.cmake:43 (include) CMake/ResolveDependency.cmake:81 (build_dependency) CMakeLists.txt:557 (resolve_dependency) ``` I note [cmake doc](https://cmake.org/cmake/help/latest/command/add_library.html#alias-libraries) says, "New in version 3.18: An ALIAS can target a non-GLOBAL Imported Target". So to allow using cmake before version 3.18, this pr proposes to explicitly set GLOBAL for the imported stemmer target. Pull Request resolved: facebookincubator#9670 Reviewed By: xiaoxmeng Differential Revision: D57072511 Pulled By: bikramSingh91 fbshipit-source-id: b7a4a6221900317393ac95051355b0a70466966c

…ator#9738) Summary: The lock order issue detected by tsan is as follows: T1: task startup creates driver and constructor operators which holds the task lock T2: value operator constructor allocates memory from memory pool for parallelized value node mode which triggers memory arbitration T3: the memory arbitration grabs arbitration lock for arbitration T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution This forms a deadlock and we have the logic to detect the memory pool allocation from the operator constructor in driver execution framework. However the memory pool of vectors used by value nodes is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation from constructor to initialize() method as the other operator does. Pull Request resolved: facebookincubator#9738 Test Plan: The test passed 100 stress test iterations: https://www.internalfb.com/intern/testinfra/testrun/10414574171435627 https://www.internalfb.com/intern/testinfra/testrun/13229323938244819 Reviewed By: bikramSingh91 Differential Revision: D57063553 Pulled By: xiaoxmeng fbshipit-source-id: 3ad661c35bfa46693729e60420f22920f83a7ff0

…9699) Summary: facebookincubator#8715 made a modification to the hash tag in the hash value. The hash tag was changed from (32,38] to (38,44]. We should also update the documentation of the hash table to reflect this change. Pull Request resolved: facebookincubator#9699 Reviewed By: xiaoxmeng Differential Revision: D57072452 Pulled By: bikramSingh91 fbshipit-source-id: 4048ee5e536ca90d7145efad59f85cc5c33d1d0f

…facebookincubator#9713) Summary: When having N columns, PartitionIdGenerator relies on N VectorHashers to generate ValueIds, with each VectorHasher’s multiplier correctly set up. However when having multiple bool columns, it triggers a corner case that VectorHasher’s multipliers won’t be set at all. Add multiplierInitialized flag and properly set it to fix this corner case. Pull Request resolved: facebookincubator#9713 Reviewed By: xiaoxmeng, spershin Differential Revision: D57062896 Pulled By: kewang1024 fbshipit-source-id: b02002e1e920735606eb830be2a025fde5a8d101

…tor#9663) Summary: Pull Request resolved: facebookincubator#9663 - Change Driver::toString() to log more relevant info. - Change Task::toString() to log more relevant info. - Change Operator::toString() to log more relevant info. Reviewed By: xiaoxmeng Differential Revision: D56734557 fbshipit-source-id: ef21e2c0d589ce257db176ef7942283bba7b8861

Summary: When SsdFileTracker looks for regions to evict, it calculates average score of all regions and picks regions with no pins and whose scores no larger than the average as candidates. It first adds up all the scores then divided it by the number of regions to calculate the average. However, if the scores are large, the sum can overflow to a small integer, leading to a scenario where none of the individual region scores is less or equal to the average. When this happens, even when all regions are unpinned, the ssd cache file will not be able to find any candidates, blocking all cache writes. Therefore, overflow needs to be handled properly. Pull Request resolved: facebookincubator#9709 Reviewed By: xiaoxmeng Differential Revision: D56965266 Pulled By: zacw7 fbshipit-source-id: 414e67975568618a0d6ba48b947c0de73cb8103c

…r#8723) Summary: This change introduces decimal signature for min_by/max_by functions. For 2-arg version of min_by/max_by, the existing signature definition is generic and no additional changes needed. For 3-arg version of min_by/max_by, an explicit signature for decimal as compare type need to be added. The 3-arg version doesn't support complex types as compare type and this we cannot define a generic signature. **Segmentation Fault in priority_queue STL:** For min_by/max_by aggregates with 3 arguments (specifies N for top N results), we need to maintain a priority_queue (PQ). This PQ is part of the accumulator. We take care of Accumulator to be memory aligned with int128_t type but PQ itself must be 128-bit aligned or else we hit a segfault. To avoid this, we can use AlignedStlAllocator to allocate PQ elements whenever value and compare types are int128_t. This seems to fix this test failure: ``` 280: [ RUN ] MinMaxByGlobalByAggregationTest.decimalSignatureTest 280: *** Aborted at 1707784514 (Unix time, try 'date -d 1707784514') *** 280: *** Signal 11 (SIGSEGV) (0x0) received by PID 27168 (pthread TID 0x7f75aa623700) (linux TID 85440) (code: 128), stack trace: *** 280: (error retrieving stack trace) ``` Testing: Added unit tests for decimals. Also, extended the existing tests to include HUGEINT type. Some of the min_by/max_by unit tests also test plan with tableScans. But, due to this issue: facebookincubator#7775, HUGEINT type is not tested for tableScans. This PR: facebookincubator#8910 already skips testing tableScans if type is not supported by DWRF. Pull Request resolved: facebookincubator#8723 Reviewed By: amitkdutta Differential Revision: D57087211 Pulled By: kagamiori fbshipit-source-id: 42b3f2af75544f70f6484e9160e0378575eb9687

…r sizes is nullptr (facebookincubator#9725) Summary: Pull Request resolved: facebookincubator#9725 `offset_` or `sizes_` can be `nullptr` if it is multi-referenced and cleared during `prepareForReuse`, in that case the old code results in crash. Reallocate them using `mutableOffsets` or `mutableSizes` in `copyRangesImpl`. Reviewed By: kevinwilfong Differential Revision: D57012135 fbshipit-source-id: 85df434f3086ac0362b220245476b88dd54fd83b

…#9425) Summary: In Gluten, both presto functions & spark functions are registered. But, there are some conflict issues. For example, when we expect to call a vector function from spark registry, a simple function with same signature from Presto registry can be called unexpectedly. So it's better to let upper framework register its own scalar function set. If some presto functions are re-usable, we need to explicitly register it for spark as this PR proposed. Pull Request resolved: facebookincubator#9425 Reviewed By: kagamiori Differential Revision: D57111008 Pulled By: pedroerp fbshipit-source-id: 8abc99989d03318b3e6925e9738883c8eb3d63d3

Summary: Return of kernel without __syncthreads() does not imply immediate visibility of writes to next kernel or host accessing unified memory. Add a barrier at kernel ends. Add GPU implementation of scatterBits. Pull Request resolved: facebookincubator#9745 Reviewed By: Yuhta Differential Revision: D57109409 Pulled By: oerling fbshipit-source-id: ea29cd8fc81dacf02f41562ec945a163d217f170

Summary: Pull Request resolved: facebookincubator#9703 X-link: facebookincubator/nimble#48 Modify interface to accept rowsToSkip. Readers will soon support it. Reviewed By: Sullivan-Patrick Differential Revision: D56953541 fbshipit-source-id: 8280a139347c60e5c918eb3bd817eb8295b2b19f

Summary: Pull Request resolved: facebookincubator#9706 I'll use it from somewhere else too. Reviewed By: kevinwilfong, Sullivan-Patrick Differential Revision: D56962043 fbshipit-source-id: b2dbe8fd09e21a77878941eeed330aec941bd669

Summary: When TTL control ages out old cache entries and evict out the entire regions from a ssd file, it tries to add those evicted region back to the writable regions and this potentially could cause duplicate writable regions as the evicted region here might not necessarily be non-writable region. Pull Request resolved: facebookincubator#9746 Reviewed By: xiaoxmeng Differential Revision: D57114817 Pulled By: zacw7 fbshipit-source-id: 7cc41c186d78fc94b5751368e12e2d38e3c226d7

…9718) Summary: We now use alwaysFalse of Filter.h and some method not implemented causing runtime error. So, we'd like to implment some alwaysFalse method to avoid the error. Pull Request resolved: facebookincubator#9718 Reviewed By: Yuhta Differential Revision: D57072475 Pulled By: bikramSingh91 fbshipit-source-id: a8986387f044cdf6f21096d06c5fdfc070aba09a

Summary: Pull Request resolved: facebookincubator#9751 To have visibility into any possibly hanging drivers. Reviewed By: xiaoxmeng Differential Revision: D57120114 fbshipit-source-id: 6df2f8c33d028d428ecb1776aa444ece564bbc78

…r#9723) Summary: Added memory allocator and cache stats to PeriodicStatsReporter. Added documentations to monitoring doc. Pull Request resolved: facebookincubator#9723 Reviewed By: xiaoxmeng, zacw7 Differential Revision: D57085696 Pulled By: tanjialiang fbshipit-source-id: 7dbc3791d314212e0c7658d0d868dac074f34fcf

…kincubator#9734) Summary: Hash join run spill in parallel by using async worker. The spill work might trigger memory allocation from non-spill memory pool such as lazy io triggered when materializing the column vector to write out. We do bypass memory arbitration if the memory allocation is for arbitration by thread-local context. However, async worker doesn't set it when running on background executor. This causes the recursive arbitration which will deadlock. This PR fixes the issue by providing createAsyncMemoryReclaimTask utility which helps setup memory arbitration context and uses it hash join spill. Unit test is added for verification. Pull Request resolved: facebookincubator#9734 Reviewed By: bikramSingh91, tanjialiang, tonyxug, oerling Differential Revision: D57092581 Pulled By: xiaoxmeng fbshipit-source-id: 8c3ab0d838214cf8bb8b657be395c4690c954dd1

…ebookincubator#9533) Summary: According to the spec of list backward compatibility [link](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists). `If the repeated field is a group with multiple fields, then its type is the element type and elements are required.` Currently Parquet reader assumes list type only has one child, which would fail for this backward compatibility case. This change makes it that when there are multiple fields, creating a new row type instance which has all the fields as its children. This diff is co-authored with qqibrow . Pull Request resolved: facebookincubator#9533 Reviewed By: mbasmanova Differential Revision: D56783371 Pulled By: Yuhta fbshipit-source-id: 50a03e55e8b49ae6f82ca8ae4101b953b60ba5e8

…ookincubator#9749) Summary: Pull Request resolved: facebookincubator#9749 In order to reuse this base class in nimble, we need to move it to `tests/utils` which will be included in `VELOX_BUILD_MINIMAL_WITH_DWIO`. Also factor out `ColumnSelector::fromScanSpec` so that we can reuse this logic in nimble without depending on `HiveConnectorUtil`. Reviewed By: pedroerp Differential Revision: D57114400 fbshipit-source-id: 406413afb924a85d1a6c93c59c8ca5d755f0b725

Summary: Pull Request resolved: facebookincubator#9760 Reviewed By: mbasmanova Differential Revision: D57162713 fbshipit-source-id: 5a19fdd0cc0917cfde5cd8fe7654bd79aac36523

) Summary: Spark support people query row_index metadata column of parquet file. Checking this spark part implement -apache/spark@95aebcb however, velox doesn't support row_index metadata, below spark query would return null for row index column ``` select a, _tmp_metadata_row_index from table; ``` related issue facebookincubator#9165 The PR introduces a new column handle type `kRowIndex` which can be used to indicate which column is row index column need be generated if you want to add a new column to the results containing the row numbers. The new column contains row number of type `BIGINT` in the file starting from 0 before any filtering and mutation, and works on all file formats supported. Pull Request resolved: facebookincubator#9174 Reviewed By: mbasmanova Differential Revision: D56472291 Pulled By: Yuhta fbshipit-source-id: 848693a9ccc5ee5e3279f012d6198721b7691d6f

Catch exception during BufferedWriter processing to avoid server crash as ~BufferedWriter expect either the object is aborted or all the data mutations have been flushed.

mbasmanova and others added 30 commits April 5, 2024 12:18

Remove unused headers in Plan.h (facebookincubator#9343)

d56f89e

Summary: Pull Request resolved: facebookincubator#9343 Reviewed By: xiaoxmeng, amitkdutta Differential Revision: D55954222 Pulled By: kgpai fbshipit-source-id: 0cafeb35f98c642e2845d114700e5214dd321327

xiaoxmeng and others added 25 commits May 6, 2024 23:46

SSD cache code cleanup (facebookincubator#9724)

668d578

Summary: Pull Request resolved: facebookincubator#9724 Reviewed By: zacw7 Differential Revision: D57012132 Pulled By: xiaoxmeng fbshipit-source-id: 390ec69d19206a7dac4aabc915db4b3b140076f4

Remove lint warnings in ParquetReaderTest (facebookincubator#9760)

3e98d40

Summary: Pull Request resolved: facebookincubator#9760 Reviewed By: mbasmanova Differential Revision: D57162713 fbshipit-source-id: 5a19fdd0cc0917cfde5cd8fe7654bd79aac36523

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2024

Handle buffered dwrf write exception to avoid server crash

8c2ca34

Catch exception during BufferedWriter processing to avoid server crash as ~BufferedWriter expect either the object is aborted or all the data mutations have been flushed.

xiaoxmeng force-pushed the writer branch from 04d5bf2 to 8c2ca34 Compare May 13, 2024 04:26

xiaoxmeng closed this May 13, 2024

xiaoxmeng deleted the writer branch May 13, 2024 04:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle buffered dwrf write exception to avoid server crash #9784

Handle buffered dwrf write exception to avoid server crash #9784

xiaoxmeng commented May 13, 2024

Handle buffered dwrf write exception to avoid server crash #9784

Handle buffered dwrf write exception to avoid server crash #9784

Conversation

xiaoxmeng commented May 13, 2024