-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error iterating pages #29
Comments
I don't think this strategy would work so well in practice due to the memory usage pattern this proposal would impose. Is there an alternative way to query this data using a min / max combination? A feature like that would allow us to impose a "window" on the data we paginate and only move the window after the iteration has completed. |
There is no option like that on Freshdesk API ( https://developers.freshdesk.com/api/#list_all_tickets ) We are using tap-freshdesk, and by debugging logs we detect that 3-10 tickets are missing on each 100 tickets single page. It is about 7% failure on an ETL proccess (acceptable must be zero). |
@mcouto-sossego Are you able to make a PR with your idea? |
@mcouto-sossego were you able to find any workaround for this ? We are also using tap-freshdesk with stichdata in our production and this behaviour (missing tickets while iterating pages) is significantly impacting the system. |
When iterating pages, some tickets may be missing due to dynamic nature of freshdesk dataset.
Since each page call may be separated several minutes from last page, tickets may be updated by agents and results may differ.
Example:
page1 = ticket1, ticket2, ticket3, ..., ticket 98, ticket99, ticket100
(wait 3 minutes)
page2 = ticket103, ticket104, ticket105, ...
In the above example, tickets 101, 102 and 103 are missed if 3 random tickets from 1 to 100 are updated in the proposed 3 minutes window.
Resolution: all pages data must be downloaded at once, before iterating in conversations and other domains.
The same apply to conversation (paged data), etc.
Source: tap_freshdesk/init.py
Function: gen_request
Proposition: make all requests in the "while loop", without any "yield", just append data to a temp var. Only yield rows from the temp var after the loop.
The text was updated successfully, but these errors were encountered: