Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Unbounded Source Implementation #56

Conversation

prodriguezdefino
Copy link
Collaborator

@prodriguezdefino prodriguezdefino commented Aug 4, 2023

This implementation reads data continuously from BigQuery as new partitions are being discovered from the underlying table. There are few conditions for the underlying table to qualify to be read continuously:

  • It should be a partitioned table
  • The partition column should be of a temporal type (Date, Timestamp)

…ed table is expired and no longer accessible.
…rom time to time, the pipeline should be able to recover reading from the beginning of the split or the last checkpointed offset
@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

@jayehwhyehentee
Copy link
Collaborator

/gcbrun

1 similar comment
@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@jayehwhyehentee jayehwhyehentee force-pushed the unboundedsource_completepartitions branch from 6fc9a8c to f56d571 Compare December 11, 2023 07:26
@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@jayehwhyehentee jayehwhyehentee merged commit 534898c into GoogleCloudDataproc:main Dec 11, 2023
4 checks passed
@jayehwhyehentee jayehwhyehentee deleted the unboundedsource_completepartitions branch December 12, 2023 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants