Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error iterating pages #29

Open
mcouto-sossego opened this issue Jan 16, 2019 · 4 comments
Open

Error iterating pages #29

mcouto-sossego opened this issue Jan 16, 2019 · 4 comments

Comments

@mcouto-sossego
Copy link

When iterating pages, some tickets may be missing due to dynamic nature of freshdesk dataset.

Since each page call may be separated several minutes from last page, tickets may be updated by agents and results may differ.

Example:

page1 = ticket1, ticket2, ticket3, ..., ticket 98, ticket99, ticket100
(wait 3 minutes)
page2 = ticket103, ticket104, ticket105, ...

In the above example, tickets 101, 102 and 103 are missed if 3 random tickets from 1 to 100 are updated in the proposed 3 minutes window.

Resolution: all pages data must be downloaded at once, before iterating in conversations and other domains.

The same apply to conversation (paged data), etc.

Source: tap_freshdesk/init.py
Function: gen_request
Proposition: make all requests in the "while loop", without any "yield", just append data to a temp var. Only yield rows from the temp var after the loop.

@KAllan357
Copy link
Contributor

I don't think this strategy would work so well in practice due to the memory usage pattern this proposal would impose.

Is there an alternative way to query this data using a min / max combination? A feature like that would allow us to impose a "window" on the data we paginate and only move the window after the iteration has completed.

@mcouto-sossego
Copy link
Author

There is no option like that on Freshdesk API ( https://developers.freshdesk.com/api/#list_all_tickets )

We are using tap-freshdesk, and by debugging logs we detect that 3-10 tickets are missing on each 100 tickets single page. It is about 7% failure on an ETL proccess (acceptable must be zero).

@luandy64
Copy link
Contributor

@mcouto-sossego Are you able to make a PR with your idea?

@dpnsh
Copy link

dpnsh commented Apr 16, 2020

@mcouto-sossego were you able to find any workaround for this ?

We are also using tap-freshdesk with stichdata in our production and this behaviour (missing tickets while iterating pages) is significantly impacting the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants