AirbyteLib: Progress Printer #34588

aaronsteers · 2024-01-28T03:18:28Z

Note:

This should merge after:
- AirbyteLib: More robust error handling, installation improvements #34572

A sync operation can take 5 minutes, 30 minutes, or multiple hours. This PR adds a self-updating progress-print when run from IPython.

As much as possible, I try to not impact performance.

Sample output below:

This is the runtime of extracting just the 'issues' stream for the repo 'airbytehq/airbyte', using the GitHub source:

What's nice about this is that I can presume this workload is probably throttled by GitHub's API rate limits, and I can see at a glance that the write to the SQL table was <2 minutes, while the extraction from source was 24 minutes. Running with 2 or 3 auth tokens to rotate (which I think this source supports) would make an interesting experiment and possible demo content.

It's also nice that this is updated while the sync is running, so I don't need to wonder if the code got stuck. If the user has an estimate of how many rows they expect, then they can also deduce a % progress - even though we can't programmatically detect that.

…le flag during install

…e-lib/progress-print

…all-failure-handling

…e-lib/progress-print

flash1293

Love this! Left two suggestions, but they are incremental improvements, not blockers.

flash1293 · 2024-01-29T12:16:46Z

airbyte-lib/airbyte_lib/progress.py

+from rich.markdown import Markdown as RichMarkdown
+
+
+try:


I really like these small QOL checks, but could we also add an option for this like pinecone does?

https://github.com/pinecone-io/pinecone-python-client/blob/61c19ad5a6a70321db44189dd90e603703670869/pinecone/data/index.py#L154

Some environments don't handle this kind of using the console very well and it's nice to be able to mute it for a clean output in a more "production" setting.

I'm definitely on board for this. The refactoring would be significant at this stage though, specifically because the progress messages are currently sent from a few different places in the codebase. While I'm not able to do so in this iteration, I definitely think this is worth doing and will probably get to it down the road.

flash1293 · 2024-01-29T12:19:29Z

airbyte-lib/airbyte_lib/progress.py

+        # This is some math to make updates adaptive to the scale of records read.
+        # We want to update the display more often when the count is low, and less
+        # often when the count is high.
+        updated_period = min(


IMHO it would make sense to throttle these updates based on the elapsed time.

The current approach works well in a lot of scenarios, but gets weird for edge cases like:

Records are read really fast in the beginning, then get slow (e.g. because it's reading a second, slower stream) - it would mean it would update really rarely, so the user can't see the slow progress on the second stream well

Records are read glacially slow (like one per second). It takes a long time for stuff to show up in the first place

By just updating once a second or so, it's lively without any risk of overloading the terminal

This is good feedback. It's not obvious here, but there's also a time-based throttle inside update_display(). In this layer I'm trying to keep as fast as possible performance, so just the math-based check. Probably we could add a time-elapsed check, but this codepath may be called literally thousands of times per second, so I'm wary to change this further right now. Happy to keep iterating after merging though.

…oundError to avoid confusion

…e-lib/progress-print

aaronsteers added 30 commits January 26, 2024 09:59

new exception type: AirbyteConnectorNotRegisteredError

abbb256

make constructors more resilient

3845f5c

print stderr in exception text, cleanup failed install, remove editab…

9fccace

…le flag during install

move auto-install out of venv constructor, for easier debugging

a217a6e

add test to assert that install failure includes pip log text

6aa85d6

update docs

dddbc78

auto-format

b1d966b

update docs

f61152a

refactor version handling, control for side effects

d665088

fix exception handling in _get_installed_version()

809918b

fix tests

4a41ffb

improve thread safety

bab5e06

handle quoted spaces in pip_url

10ce077

fix import sorts

063bba3

standalone validate_config() method

ab75be4

add Source.yaml_spec property

8880b0b

make _yaml_spec a protected member

3773149

fix too-limited json package_data glob

90918c8

basic progress reporting

5f0bcb3

remove raw=True

9ed1929

add progress tracker class

ec4d8dd

update docs

5d4eb45

bug fixes

a164168

fix separator

0c62f04

bug fixes

bec8d11

bug fix

df24520

improved logs

6cc9f50

fix progress bugs, add unit tests

6d6708c

add reset() at beginning of sync

bf11816

linting fix and fix missing stream count for progress reset

f1505f1

Merge branch 'aj/airbyte-lib/install-failure-handling' into aj/airbyt…

0772a68

…e-lib/progress-print

octavia-squidington-iii removed the area/connectors Connector related issues label Jan 28, 2024

aaronsteers added 4 commits January 28, 2024 11:38

remove redundant strings

f975282

Merge remote-tracking branch 'origin/master' into aj/airbyte-lib/inst…

ace7208

…all-failure-handling

update docs (removes empty cloud page)

8775c1b

Merge branch 'aj/airbyte-lib/install-failure-handling' into aj/airbyt…

34cb485

…e-lib/progress-print

vercel bot deployed to Preview January 28, 2024 20:01 View deployment

flash1293 approved these changes Jan 29, 2024

View reviewed changes

aaronsteers added 5 commits January 29, 2024 21:33

remove unused lock

f24226d

rename AirbyteConnectoNotFoundError to AirbyteConnectorExecutableNotF…

7370d83

…oundError to avoid confusion

Merge branch 'master' into aj/airbyte-lib/install-failure-handling

cfafccc

allow prereleases in version check

8446838

Merge branch 'aj/airbyte-lib/install-failure-handling' into aj/airbyt…

8b85a67

…e-lib/progress-print

vercel bot deployed to Preview January 30, 2024 06:01 View deployment

Base automatically changed from aj/airbyte-lib/install-failure-handling to master January 30, 2024 06:07

Merge branch 'master' into aj/airbyte-lib/progress-print

11ebbb8

aaronsteers enabled auto-merge (squash) January 30, 2024 06:10

aaronsteers added 3 commits January 29, 2024 22:13

fix missing copyright

a2d6a5c

Merge branch 'master' into aj/airbyte-lib/progress-print

df77431

update docs

4d6a5aa

aaronsteers merged commit f35c2a6 into master Jan 30, 2024
18 checks passed

aaronsteers deleted the aj/airbyte-lib/progress-print branch January 30, 2024 06:39

jbfbell pushed a commit that referenced this pull request Feb 1, 2024

AirbyteLib: Progress Printer (#34588)

6408023

jbfbell mentioned this pull request Feb 21, 2024

Destination Oracle - Remove Normalization #35470

Closed

jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024

AirbyteLib: Progress Printer (airbytehq#34588)

465b3f0

jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024

AirbyteLib: Progress Printer (airbytehq#34588)

8009e35

jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024

AirbyteLib: Progress Printer (airbytehq#34588)

3b544e3

jbfbell mentioned this pull request Mar 7, 2024

Destination MSSQL - Remove Normalization #35874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AirbyteLib: Progress Printer #34588

AirbyteLib: Progress Printer #34588

aaronsteers commented Jan 28, 2024 •

edited

Loading

flash1293 left a comment

flash1293 Jan 29, 2024

aaronsteers Jan 30, 2024

flash1293 Jan 29, 2024

aaronsteers Jan 30, 2024

AirbyteLib: Progress Printer #34588

AirbyteLib: Progress Printer #34588

Conversation

aaronsteers commented Jan 28, 2024 • edited Loading

flash1293 left a comment

Choose a reason for hiding this comment

flash1293 Jan 29, 2024

Choose a reason for hiding this comment

aaronsteers Jan 30, 2024

Choose a reason for hiding this comment

flash1293 Jan 29, 2024

Choose a reason for hiding this comment

aaronsteers Jan 30, 2024

Choose a reason for hiding this comment

aaronsteers commented Jan 28, 2024 •

edited

Loading