Adds shell script for unbounded table read , modifies table_read to execute the same. #77

prashastia · 2023-12-23T21:03:12Z

Adds shell script that

Creates the asynchronously running unbounded job - the dataproc job is created in mode unbounded
Dynamically adds partitions - insert_dynamic_partitions.py is executed with the necessary parameters.
kills the dataproc job - As the read has completed and its correctness needs to be checked further.

/gcbrun

This module is similar to the BigQueryExample. A few changes to count the number of records and log them.

This test reads a simpleTable. Shell script and python script to check the number of records read.

comments CODECOV_TOKEN usage.

…bounded and unbounded source.

…ble with complex schema.

…ds to different tables required for the e2e tests.

…nded-read-2-3 # Conflicts: # cloudbuild/python-scripts/utils/utils.py

…o fresh partitioned table creation.

…' into nightly-tests-unbounded-read-3-4

…eation.

…' into nightly-tests-unbounded-read-3-4

…rgument. Reformats the file.

…s-unbounded-read-3-4

…nded-read-3-4

jayehwhyehentee · 2024-01-08T08:13:09Z

cloudbuild/e2e-test-scripts/table_read.sh

+else
+  echo "Unbounded Mode!"
+  source cloudbuild/e2e-test-scripts/unbounded_table_read.sh


We should explicitly check if mode is unbounded, just like the bounded case. If mode is neither bounded nor unbounded, throw error.

cloudbuild/e2e-test-scripts/unbounded_table_read.sh

jayehwhyehentee · 2024-01-08T08:16:09Z

cloudbuild/e2e-test-scripts/unbounded_table_read.sh

+python3 cloudbuild/python-scripts/insert_dynamic_partitions.py -- --project_name "$PROJECT_NAME" --dataset_name "$DATASET_NAME" --table_name "$TABLE_NAME" --refresh_interval "$PARTITION_DISCOVERY_INTERVAL"
+
+# Wait for a bit, as mapping and output of records takes some time.
+sleep 3m


How did we arrive at this number?

In our code we wait 2.5 mins for the read streams to form (prior to insertion).
This also makes sure that the previously inserted records are read properly.
But after the last insertion, since there is no wait in the python code, we wait through the script.

Can remove the extra 30 seconds.

Having a cushion is alright. Just wanted to understand the reason behind this number.

jayehwhyehentee · 2024-01-09T07:50:43Z

cloudbuild/e2e-test-scripts/table_read.sh

@@ -36,7 +36,13 @@ if [ "$MODE" == "bounded" ]
 then
  echo "Bounded Mode!"
  source cloudbuild/e2e-test-scripts/bounded_table_read.sh
-
+elif [ "$MODE" == "unbounded" ]


Let's use shell case statement, as here

prashastia added 30 commits December 18, 2023 10:54

Adds a new module for nightly tests.

a4d0d25

This module is similar to the BigQueryExample. A few changes to count the number of records and log them.

Modifies the docstring.

f674638

Modifies the docstring.

6ec22dc

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

f0cd825

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

c814bd9

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

f5d3f31

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

b9cad97

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

b3ce92b

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

4e6ef67

This test reads a simpleTable. Shell script and python script to check the number of records read.

Modifies IntegrationTest to check query correctness.

dd8e5be

Adds spotless:apply.

137030d

comments CODECOV_TOKEN usage.

Renames the table_read file to bounded_table_read.sh

957ebf0

Addresses review comments.

2ff4032

Creates separate shell script for bounded jobs.

394132d

Adds a new common shell script for common actions performed for both …

5425806

…bounded and unbounded source.

Addresses review comments.

ea79601

Addresses review comments.

c6a564f

Formats the file.

a2a49b3

Fixes checkstyle violations.

3da1b14

Fixes checkstyle violations.

0f34c03

Fixes checkstyle violations.

0822681

Fixes checkstyle violations.

1770526

Changes test name, adds a new e2e test for checking table read for ta…

e416b1f

…ble with complex schema.

Changes test name, adds a new e2e test for checking table read for ta…

471aca2

…ble with complex schema.

Adds a new e2e test for checking query read.

68d48d0

Fixes cause of error in query read execution.

44f39f6

Fixes cause of error in query read execution.

b8a5883

Adds a new e2e test for checking large table ~200GBs read.

46ac1e6

Adds utils.py - a class containing implementations for writing recor…

74afc54

…ds to different tables required for the e2e tests.

Addresses review comments in parse_logs.py.

ea0cba5

prashastia added 23 commits January 2, 2024 15:35

Merge remote-tracking branch 'dataproc/main' into nightly-tests-unbou…

4127553

…nded-read-2-3 # Conflicts: # cloudbuild/python-scripts/utils/utils.py

Changes insert_dynamic_partitions.py to account for increased rows. N…

6078ebe

…o fresh partitioned table creation.

Merge remote-tracking branch 'origin/nightly-tests-unbounded-read-2-3…

65328cf

…' into nightly-tests-unbounded-read-3-4

Modifies unbounded_table_read.sh to remove fresh partitioned table cr…

c05cd43

…eation.

Addresses review comments.

54170ed

Addresses review comments.

f2f21d5

Addresses review comments.

543817e

Addresses review comments.

b814589

Addresses review comments.

b68ea5b

Addresses review comments.

820dcc8

Merge remote-tracking branch 'origin/nightly-tests-unbounded-read-2-3…

c06d28a

…' into nightly-tests-unbounded-read-3-4

Modifies the shell script according to changes in the python file.

ddf752b

Reformats the file.

2bd9fad

Addresses review comments.

a02570f

Addresses review comments.

2c7cab2

Addresses review comments.

d48c345

Addresses review comments.

4118b97

Addresses review comments. Takes number of rows per partition as an a…

5eee53a

…rgument. Reformats the file.

Addresses review comments.

3f5367e

Addresses review comments.

a8fbfb8

Reduces the wait time - an experiment

393a33f

Merge remote-tracking branch 'origin/to-be-deleted' into nightly-test…

83f2506

…s-unbounded-read-3-4

Merge remote-tracking branch 'dataproc/main' into nightly-tests-unbou…

280ee57

…nded-read-3-4

jayehwhyehentee reviewed Jan 8, 2024

View reviewed changes

cloudbuild/e2e-test-scripts/unbounded_table_read.sh Show resolved Hide resolved

jayehwhyehentee reviewed Jan 8, 2024

View reviewed changes

Adds error in case of wrong mode.

f31fbdf

jayehwhyehentee reviewed Jan 9, 2024

View reviewed changes

jayehwhyehentee approved these changes Jan 9, 2024

View reviewed changes

jayehwhyehentee merged commit a50014b into GoogleCloudDataproc:main Jan 9, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds shell script for unbounded table read , modifies table_read to execute the same. #77

Adds shell script for unbounded table read , modifies table_read to execute the same. #77

prashastia commented Dec 23, 2023

jayehwhyehentee Jan 8, 2024

jayehwhyehentee Jan 8, 2024

prashastia Jan 8, 2024

jayehwhyehentee Jan 9, 2024

jayehwhyehentee Jan 9, 2024

Adds shell script for unbounded table read , modifies table_read to execute the same. #77

Adds shell script for unbounded table read , modifies table_read to execute the same. #77

Conversation

prashastia commented Dec 23, 2023

jayehwhyehentee Jan 8, 2024

Choose a reason for hiding this comment

jayehwhyehentee Jan 8, 2024

Choose a reason for hiding this comment

prashastia Jan 8, 2024

Choose a reason for hiding this comment

jayehwhyehentee Jan 9, 2024

Choose a reason for hiding this comment

jayehwhyehentee Jan 9, 2024

Choose a reason for hiding this comment