[Design] Add support for DELETE #2452
Replies: 3 comments
-
@dingqiaoyi Thank you for a detailed write up. I do not understand this fully, hence, some questions. The design above lists a few DELETE query patterns. Would it be possible to provide examples of queries for each pattern? This will help understand what "simple" and "complex" correlated subqueries mean. Also, would you clarify that does "(pull-up Execs from subquery)" mean?
Just to clarify, Velox takes a single fragment as input, not the whole distributed plan.
This requires scan to be collocated with the join and therefore limits the set of supported queries. These queries would need to either broadcast join build side (which implies that join build side is relatively small) or make sure that both probe and build side are bucketed on the join key. I wonder what do you think about these limitations and whether it is worth calling them out explicitly in the design proposal above.
Would you give some examples to help understand this statement?
Would you clarify a bit what do you mean by "usually have to interact with optimizer and DAG/task manager" and how you are thinking of implementing this logic "as operators and reusing the data channels". Some examples would be helpful. |
Beta Was this translation helpful? Give feedback.
-
Hi @dingqiaoyi , We're also looking at commit protocol for DMLs. And I like the idea of abstracting into an SPI WriteCommitProtocol. It'd be nice if you could include more specifics on how DELETE works on the level of TableWriter. I'm not sure if it's necessary to introduce an additional TableFinish operator to Velox execution. If it's intended for distributed commit, can we do it in TableWriter and save an additional operator? As you might know, Presto today has the implementation of TableFinish commit operator, which is scheduled to run on the centralized coordinator side, w/ parallelism set to 1 as you pointed out, and primary performs Metastore commit that has to run on the coordinator. It is too much to add Metastore support to Velox. Another aspect that I'd like to have your thought is, should commit protocol be at the level of connector or catalog? Ex., Hive connector could operate on HDFS or local file system. We might want to have different commit protocol for the two. |
Beta Was this translation helpful? Give feedback.
-
Background
We are making an impressive progress on the QUERY functionalities and performance improvements, but with limited support for other DMLs such as INSERT INTO,DELETE, UPDATE and MERGE INTO/UPSERT.
It is important to Velox as a unified execution engine to support both read and write workloads.
I believe adding DELETE support is a good start, hoping to lay a solid foundation to the support for other DMLs in Velox.
This document is an overall design of the support for DELETE in Velox.
Goals
Design
1. Writes at row level
Execution engine
Storage engine/Table format protocol
2. Transactional writes support
3. Avoid lock-in considerations
Tasks
Plan
Beta Was this translation helpful? Give feedback.
All reactions