Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seller_muxed_id runs into overflow as uint64 #300

Open
sydneynotthecity opened this issue Jan 2, 2025 · 1 comment
Open

seller_muxed_id runs into overflow as uint64 #300

sydneynotthecity opened this issue Jan 2, 2025 · 1 comment

Comments

@sydneynotthecity
Copy link
Contributor

Bug Overview:

stellar-etl failed to insert records to history_effects because of an ETL schema mismatch between the export process and BigQuery. stellar-etl writes seller_muxed_id as a uint64 but BigQuery does not have unsigned integer types; it only can store signed integers. This means if an ID exceeds the int64 limit, the field overflows and cannot write the value to BigQuery

equest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 59790; errors: 1. Please look into the errors[] collection for more details. File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt; reason: invalid, location: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 59790; errors: 1. Please look into the errors[] collection for more details. File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 59790; errors: 1; max bad: 0; error percent: 0; reason: invalid, location: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt, message: 
Error while reading data, error message: JSON parsing error in row starting at position 37793635: Could not convert value 'number_value: 	 "15723465199396777564"' to integer. Field: seller_muxed_id; Value: 15723465199396777564 File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt
[2024-08-19, 14:05:04 UTC] {taskinstance.py:1346} INFO - Marking task as UP_FOR_RETRY. dag_id=history_table_export, task_id=del_ins_history_effects_task, execution_date=20240817T102000, start_date=20240819T140440, end_date=20240819T140504
[2024-08-19, 14:05:04 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 1393772 for task del_ins_history_effects_task (400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 59790; errors: 1. Please look into the errors[] collection for more details. File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt; reason: invalid, location: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 59790; errors: 1. Please look into the errors[] collection for more details. File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 59790; errors: 1; max bad: 0; error percent: 0; reason: invalid, location: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt, message: Error while reading data, error message: JSON parsing error in row starting at position 37793635: Could not convert value 'number_value: 	 "15723465199396777564"' to integer. Field: seller_muxed_id; Value: 15723465199396777564 File: gs://us-central1-hubble-14c4ca64-bucket/dag-exported/scheduled__2024-08-17T10:20:00+00:00/53071185-53071290-effects.txt; 508035)
[2024-08-19, 14:05:05 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1

To reproduce:

Clear the following Airflow task on mainnet del_ins_history_effects_task for run ids:

  • scheduled__2024-08-17T10:20:00+00:00
  • scheduled__2024-08-17T10:30:00+00:00
  • scheduled__2024-08-17T10:40:00+00:00

Example ledger with too large seller_muxed_id: "ledger_sequence":53071228,"operation_id":227939188618612737

Horizon API response

Suggested Fix:

Google suggests converting uint64 to string to avoid such errors. We should convert seller_muxed_id to string datatype so that we do not have data loss.

@sydneynotthecity
Copy link
Contributor Author

All 3 batch ids listed need to be executed to restore full history coverage of the table. We are missing data for those listed batch ids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants