-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Empower Connector #902
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jburchard, everything here looks good to me, honestly! I just have a personal question! From what I understand, it seems that the connector basically grabs an object, which contains all the relevant data and then splits it up between the relevant tables that each method is associated with. Everything looks clean and makes sense. And all the documentation and mock tests and everything looks great.
@shaunagm, not sure if you might have any other comments or suggestions? Just marking this as a comment for now in case you might have anything to add before approval?
|
||
.. note:: | ||
The Empower API only has a single endpoint to access all account data. As such, it has a very high overhead. This | ||
connector employs caching in order to allow the user to specify the tables to extract without additional API calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a personal question, coming from lack of experience, but wondering what caching is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there is only one endpoint and it grabs a giant JSON blob that is a bit of a pain to parse through and convert to a tabular-ish format. I've broken up that JSON into multiple functions so that folks can grab the bits of data that they want.
However, that means that every single time you call it function, it is grabbing a lot of extraneous data. So, the caching just stores the blob and extracts from it rather than calling the Empower server again.
At the time, it seemed like a cute idea, but perhaps its unnecessary.
To instantiate the Empower class, you can either store your ``EMPOWER_API_KEY`` an environment | ||
variables or pass them in as arguments: | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to have an example of disabling caching in the quickstart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do.
else: | ||
return self.data | ||
|
||
def _unix_convert(self, ts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore this, Justin, but just flagging that this is a potential utility method (see #554 )
ts = ts.strftime("%Y-%m-%d %H:%M:%S UTC") | ||
return ts | ||
|
||
def _empty_obj(self, obj_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there is a reason this is a separate method rather than just using this code in the one place it's referenced? You could write it with just that first line - the return True and return False aren't necessary.
See :ref:`parsons-table` for output options. | ||
""" | ||
|
||
tbl = Table(self.data["profiles"]).long_table("eid", "activeCtaIds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long_table
is a new one on me! Mind sharing what it is and why you're using it here? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this documentation includes a pretty decent example. Basically, it takes a column that has a nested JSON in it -- in this case a list of active call to action ids -- and it creates a new table that is one row for each call to action id and the id of the profile. Then, if you are storing in a DB, you can easily join the two together.
However, this transformation and, many others in parson, was built before JSONs in BigQuery/Redshift became a more widespread thing. So, in a world in which you can store JSONs and query the elements in SQL, this may no longer make sense. Tldr: Should parson's connectors be extracting nested JSONs?
col_list = [v for v in tbl.columns if v.find("value") != -1] | ||
tbl.coalesce_columns("answer_id", col_list, remove_source_columns=True) | ||
tbl.remove_column("uid") | ||
tbl.remove_column("answers") # Per docs, this is deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that remove_column errors if the column name isn't found. Is it possible that this column name will stop getting returned? Or do you mean it's deprecated in some other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This column is still being returned by Empower, but their docs indicate that it isn't being used for anything any longer. So, I decided to not surface it for the user.
I added a couple of comments/questions @jburchard but nothing major. Should be pretty close to merging this. |
Thanks @shaunagm @sharinetmc for the feedback. I'll take another pass. There are also a few slight modifications (a few errors surfaced) when I started using a forked version of this, that I do want to include in the PR before it is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just formally requesting changes so it shows up with that status in the pull request list view :)
Hi Justin, zero pressure - just wanted to see if you have plans to get to this soon? |
TMC has been using Airbyte for our Empower sync and it's no longer working so we're looking for an alternative. This looks like good work, can TMC do anything to help get this over the line? Also, we were not doing any unnesting at all at the load step, so I'd like to include a function/option to just return the results as a 1 row table with all that nested json that we would write to GCS and then load into BQ, our dbt pipelines handle the rest. |
Merged in #1191 so closing this. |
This connector adds support for the Empower relational app. As of my most recent testing, it includes support for all of the different json objects that are included in their API.