Rebase twitter's commit onto prestodb master #248

beinan · 2020-04-21T20:03:07Z

No description provided.

When SORTED_WRITE_TO_TEMP_PATH_ENABLED is true, we would require a temporary path for sorted writes.

… issues

…-field in struct

Soft memory limits are default memory limits given to each query that can be overridden using session properties up to the hard limit set by the existing configuration properties. Having soft limits makes it easier to migrate a workload to lower memory limits by allowing only the queries that require higher limits to specify them while defaulting other queries to lower limits. Available soft memory limit configuration properties: query.soft-max-memory-per-node query.soft-max-total-memory-per-node query.soft-max-total-memory query.soft-max-memory

Adding a configuration to handle compression codec for handling orc and dwrf storage format. Use hive.orc_compression_codec to override the generic compression codec for orc and dwrf storage format. The reason to add an extra configuration was the unavailability of uniform support of all compression codec across all storage formats. The ZSTD compression codec is only available for orc and dwrf storage format.

We have need for this function in several places, and it is purely geometric.

Adds a parent abstract class to PrestoS3FileSystemMetricsCollector so that other SDK clients can share the metrics collector support. Adds reporting for client retry pause time indicating how long the thread was asleep between request retries in the client itself. Fixes the reporting client timings. Previously, when the client retried a request only the first request timings would be recorded in the stats. Now, all request timings are reported individually.

Previously, an instance of PrestoS3FileSystemStats instance was created in PrestoS3ClientFactory which means it would not report S3 client stats to the instance registered with JMX. This would only have affected PrestoS3Select clients. Now the same metric instance is shared with PrestoS3FileSystem

In SHOW FUNCTIONS results, list the built-in functions first, and then the SQL functions, in alphabetical order of the qualified function names.

Minor variable renames

Page sink commit mechanism is a general connector capability and is not restricted only for partition commit.

It can be used not only to commit lifespans or physical partitions. In fact it can be used to commit any page sink write.

Co-authored-by: Andrii Rosa <[email protected]>

Tasks in spark are often retried and run speculatively, thus the commit protocol required for table writes to avoid data corruption Co-authored-by: Andrii Rosa <[email protected]>

A footer consists of two parts. - offset of each stripe's start location. - footer's total size in bytes.

TestRowBasedSerialization sometimes fails calling createRandomLongDecimalsBlock with less than 10 positions. We should allow blocks with less than 10 positions to be created if there are such needs. This commit removes the check to enforce the positionCount check, and comments were added to suggest the user use a larger positinCount when desired nullRate > 0.

`

We skip the index files.

…ld failure on java 11

… Parquet schema mismatch checking (twitter-forks#245) * Compare type by (name,type) pair rather than (index,type) pair during Parquet schema mismatch checking * add unit test for parquet schema mismatch checker

Fix unit test

shixuan-fan and others added 30 commits April 21, 2020 13:37

Fix writing tables with preferred ordering using temp path

e1ddc0e

When SORTED_WRITE_TO_TEMP_PATH_ENABLED is true, we would require a temporary path for sorted writes.

Use field name for Parquet schema mismatch checking

5368a56

add unit test for parquet schema mismatch checker and fix code review…

746c290

… issues

Make parquet's schema check more tolerant for adding and removing sub…

7a01f2b

…-field in struct

Clean up PlanOptimizerProvider while dropping connector

6aac57f

Refactor flattenCollection to GeometryUtils

fc81c05

We have need for this function in several places, and it is purely geometric.

Remove unused code from TestRowBasedSerialization#testRandomBlocks

22a421b

Display built-in functions first in SHOW FUNCTIONS

90e7405

In SHOW FUNCTIONS results, list the built-in functions first, and then the SQL functions, in alphabetical order of the qualified function names.

Fix QueryResource for queued queries

25fe380

Allow queued queries to be preempted and canceled

4405e0c

add presto-druid connector to server tarball

353853b

Fix float value for druid connector

7a988d5

escape keyword for druid column name

a20247b

Add session property for druid connector

34b3f6a

Fix query client timeout

0e6c5ff

Add release notes for 0.234.1 and 0.234.2

8f46c25

Refactor HiveWriterFactory

b4f826e

Minor variable renames

Rename PageSinkProperties#isPartitionCommitRequired

2c9385a

Page sink commit mechanism is a general connector capability and is not restricted only for partition commit.

Rename partition / lifespan commit into page sink commit

8df5254

It can be used not only to commit lifespans or physical partitions. In fact it can be used to commit any page sink write.

Introduce PageSinkCommitStrategy

c93c8ce

Co-authored-by: Andrii Rosa <[email protected]>

Support table write commit in Presto on Spark

2bfddd0

Tasks in spark are often retried and run speculatively, thus the commit protocol required for table writes to avoid data corruption Co-authored-by: Andrii Rosa <[email protected]>

Add footer to PageFile format

1cc59eb

A footer consists of two parts. - offset of each stripe's start location. - footer's total size in bytes.

Create zero row file for PageFile format

f8fac99

Support splittable read of PageFile

0a0c6ff

Fix "bound must be positive" exception when creating DictionaryBlock

ca3f93f

beinan and others added 26 commits May 8, 2020 21:15

fix bugs on by-pass authentication

1fd6458

Fix the null value bug for the columns with the name of sql keywords

a2006df

`

fix CTAS failures when using viewfs

cbf81c1

Ignore not found files

a2a9958

Fix no codec issue for lzo index files.

aead098

We skip the index files.

Add dependency of javax.annotation into presto-catch's pom to fix bui…

e350273

…ld failure on java 11

Fix incompatible types error on jdk 9 and 11

2b6c0d1

Add a counter stats for namenode ops (twitter-forks#240)

cbd4ea9

Handle empty structs by inserting an unknown field

a6d9f6b

Compare type by (name,type) pair rather than (index,type) pair during…

b596173

… Parquet schema mismatch checking (twitter-forks#245) * Compare type by (name,type) pair rather than (index,type) pair during Parquet schema mismatch checking * add unit test for parquet schema mismatch checker

rebase to 234, fix complie error and refactoring

ed07cfd

add druid module to presto-twitter-server

4de509c

fix by-pass authentication for presto 0.234+

4bcd21e

Add configuration for hdfs config resources

cbf2f56

Fix json type deserialization issue on DruidSegmentInfo

572fa2a

Add dependency of fastutil for Druid HDFS file parsing

b3d309a

Support query columns with 'Other' type

fe06bc6

Escaping special characters in druid table name and column name.

8841530

Add the problematical file path to the presto exception message

63b111a

Add presto-druid into presto-product-test config properties

ba6cef4

Fix compile errors after rebase 236 master

e27d441

Exclude transitive dependency to fix build error

b20c191

Fix the exception of 'Internal file \"column_name\" doesn't exist'

8813de0

Improve JSON serialization for the http requests to Druid

18dd19f

Escaping variable name and the column name inside distinct count.

cc6ef24

Fix unit test

Add unit test case for druid.hadoop.config.resources

3ff4f01

beinan force-pushed the prestodb-twitter-234-druid branch from cf7de87 to 3ff4f01 Compare May 9, 2020 04:24

beinan added 3 commits May 9, 2020 15:44

Refactoring and fix code reivew issue

0a9a165

[Ignore when rebase, already in upstream] fix json serialization

53662bf

Skip escaping columnnames for table-scan on hdfs

641d06c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase twitter's commit onto prestodb master #248

Rebase twitter's commit onto prestodb master #248

beinan commented Apr 21, 2020

Rebase twitter's commit onto prestodb master #248

Are you sure you want to change the base?

Rebase twitter's commit onto prestodb master #248

Conversation

beinan commented Apr 21, 2020