Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5659][VL] Add more configs for AWS s3 #5660

Merged
merged 3 commits into from
Jun 25, 2024
Merged

Conversation

yma11
Copy link
Contributor

@yma11 yma11 commented May 8, 2024

What changes were proposed in this pull request?

Add more configs for AWS s3

spark.gluten.velox.fs.s3a.retry.mode
spark.gluten.velox.fs.s3a.connect.timeout
spark.hadoop.fs.s3a.retry.limit
spark.hadoop.fs.s3a.connection.maximum

How was this patch tested?

CI

Copy link

github-actions bot commented May 8, 2024

#5659

Copy link

github-actions bot commented May 8, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

val AWS_S3_RETRY_MODE =
buildConf("spark.gluten.velox.fs.s3a.retry.mode")
.internal()
.doc("Retry mode for AWS s3 connection error.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add "standard" and "adaptive" as well for the doc?

@@ -687,6 +699,10 @@ object GlutenConfig {
(SPARK_S3_USE_INSTANCE_CREDENTIALS, "false"),
(SPARK_S3_IAM, ""),
(SPARK_S3_IAM_SESSION_NAME, ""),
(SPARK_S3_RETRY_MAX_ATTEMPTS, "3"),
(SPARK_S3_CONNECTION_MAXIMUM, "96"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

96/3 can also be the default value for the config and use something like AWS_S3_CONNECT_TIMEOUT.defaultValueString as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These keys can also use defaultValueString but existing code also use the value directly for easy reading unless it's a new config in gluten itself. So let's keep it.

const std::string kVeloxS3RetryModeDefault = "legacy";
// Connection timeout for AWS s3
const std::string kVeloxS3ConnectTimeout = "spark.gluten.velox.fs.s3a.connect.timeout";
const std::string kVeloxS3ConnectTimeoutDefault = "1s";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we align the default config value with the hadoop config. https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/core-default.xml , such as the default value for fs.s3a.connection.timeout is 200000ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

200s seems quite a large number. @FelixYBW what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use hadoop's default one

@FelixYBW
Copy link
Contributor

Is the Velox PR merged? can this PR be merged?

@leesf
Copy link
Contributor

leesf commented Jun 19, 2024

Is the Velox PR merged? can this PR be merged?

Yes, related changes have been merged in velox. facebookincubator/velox@74fe2ba

Copy link

Run Gluten Clickhouse CI

@yma11
Copy link
Contributor Author

yma11 commented Jun 19, 2024

@leesf Thanks for comments. Updated and please help review again.

@@ -688,6 +700,10 @@ object GlutenConfig {
(SPARK_S3_USE_INSTANCE_CREDENTIALS, "false"),
(SPARK_S3_IAM, ""),
(SPARK_S3_IAM_SESSION_NAME, ""),
(SPARK_S3_RETRY_MAX_ATTEMPTS, "3"),
(SPARK_S3_CONNECTION_MAXIMUM, "96"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs.s3a.attempts.maximum and fs.s3a.connection.maximum can also be aligned with hadoop config, default value is 20 and 15.

.internal()
.doc("Timeout for AWS s3 connection.")
.stringConf
.createWithDefault("1s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be changed to 200s as well.

Copy link

Run Gluten Clickhouse CI

@@ -64,6 +71,10 @@ std::shared_ptr<facebook::velox::core::MemConfig> getHiveConfig(std::shared_ptr<
bool useInstanceCredentials = conf->get<bool>("spark.hadoop.fs.s3a.use.instance.credentials", false);
std::string iamRole = conf->get<std::string>("spark.hadoop.fs.s3a.iam.role", "");
std::string iamRoleSessionName = conf->get<std::string>("spark.hadoop.fs.s3a.iam.role.session.name", "");
std::string retryMaxAttempts = conf->get<std::string>("spark.hadoop.fs.s3a.retry.limit", "3");
std::string retryMode = conf->get<std::string>(kVeloxS3RetryMode, kVeloxS3RetryModeDefault);
std::string maxConnections = conf->get<std::string>("spark.hadoop.fs.s3a.connection.maximum", "96");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 and 96 should be changed as well.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@FelixYBW FelixYBW merged commit cf04f0f into apache:main Jun 25, 2024
42 checks passed
@FelixYBW FelixYBW deleted the s3-retry branch June 25, 2024 02:34
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5660_time.csv log/native_master_06_24_2024_f07e348f4_time.csv difference percentage
q1 34.93 35.39 0.461 101.32%
q2 26.82 23.65 -3.170 88.18%
q3 38.57 40.35 1.775 104.60%
q4 35.64 32.68 -2.956 91.70%
q5 71.89 70.64 -1.251 98.26%
q6 8.02 9.08 1.057 113.18%
q7 85.04 80.58 -4.460 94.76%
q8 86.73 87.90 1.173 101.35%
q9 119.73 125.50 5.770 104.82%
q10 46.51 48.85 2.339 105.03%
q11 22.45 20.45 -1.999 91.09%
q12 24.36 26.51 2.154 108.84%
q13 39.83 38.63 -1.205 96.97%
q14 18.07 22.31 4.248 123.51%
q15 33.60 31.83 -1.761 94.76%
q16 14.41 14.13 -0.276 98.08%
q17 103.26 103.74 0.477 100.46%
q18 149.43 144.47 -4.965 96.68%
q19 13.88 13.92 0.041 100.29%
q20 28.38 29.16 0.783 102.76%
q21 260.90 264.29 3.390 101.30%
q22 15.43 12.24 -3.189 79.32%
total 1277.87 1276.30 -1.568 99.88%

deepashreeraghu pushed a commit to deepashreeraghu/incubator-gluten that referenced this pull request Jun 26, 2024
Add more configs for AWS s3

spark.gluten.velox.fs.s3a.retry.mode
spark.gluten.velox.fs.s3a.connect.timeout
spark.hadoop.fs.s3a.retry.limit
spark.hadoop.fs.s3a.connection.maximum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants