Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Empower Connector #902

Closed
wants to merge 5 commits into from
Closed

Add Empower Connector #902

wants to merge 5 commits into from

Conversation

jburchard
Copy link
Collaborator

This connector adds support for the Empower relational app. As of my most recent testing, it includes support for all of the different json objects that are included in their API.

@jburchard jburchard added the new connector Work type - creating a new Parsons connector for a tool label Oct 7, 2023
@jburchard jburchard requested a review from shaunagm October 7, 2023 21:50
@shaunagm shaunagm requested review from KasiaHinkson and removed request for shaunagm October 26, 2023 19:58
@shaunagm shaunagm requested review from sharinetmc and removed request for KasiaHinkson November 9, 2023 20:38
@shaunagm shaunagm assigned sharinetmc and unassigned KasiaHinkson Nov 9, 2023
Copy link
Contributor

@sharinetmc sharinetmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jburchard, everything here looks good to me, honestly! I just have a personal question! From what I understand, it seems that the connector basically grabs an object, which contains all the relevant data and then splits it up between the relevant tables that each method is associated with. Everything looks clean and makes sense. And all the documentation and mock tests and everything looks great.

@shaunagm, not sure if you might have any other comments or suggestions? Just marking this as a comment for now in case you might have anything to add before approval?


.. note::
The Empower API only has a single endpoint to access all account data. As such, it has a very high overhead. This
connector employs caching in order to allow the user to specify the tables to extract without additional API calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a personal question, coming from lack of experience, but wondering what caching is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is only one endpoint and it grabs a giant JSON blob that is a bit of a pain to parse through and convert to a tabular-ish format. I've broken up that JSON into multiple functions so that folks can grab the bits of data that they want.

However, that means that every single time you call it function, it is grabbing a lot of extraneous data. So, the caching just stores the blob and extracts from it rather than calling the Empower server again.

At the time, it seemed like a cute idea, but perhaps its unnecessary.

To instantiate the Empower class, you can either store your ``EMPOWER_API_KEY`` an environment
variables or pass them in as arguments:

.. code-block:: python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have an example of disabling caching in the quickstart

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.

else:
return self.data

def _unix_convert(self, ts):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore this, Justin, but just flagging that this is a potential utility method (see #554 )

ts = ts.strftime("%Y-%m-%d %H:%M:%S UTC")
return ts

def _empty_obj(self, obj_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is a reason this is a separate method rather than just using this code in the one place it's referenced? You could write it with just that first line - the return True and return False aren't necessary.

See :ref:`parsons-table` for output options.
"""

tbl = Table(self.data["profiles"]).long_table("eid", "activeCtaIds")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_table is a new one on me! Mind sharing what it is and why you're using it here? Thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this documentation includes a pretty decent example. Basically, it takes a column that has a nested JSON in it -- in this case a list of active call to action ids -- and it creates a new table that is one row for each call to action id and the id of the profile. Then, if you are storing in a DB, you can easily join the two together.

However, this transformation and, many others in parson, was built before JSONs in BigQuery/Redshift became a more widespread thing. So, in a world in which you can store JSONs and query the elements in SQL, this may no longer make sense. Tldr: Should parson's connectors be extracting nested JSONs?

col_list = [v for v in tbl.columns if v.find("value") != -1]
tbl.coalesce_columns("answer_id", col_list, remove_source_columns=True)
tbl.remove_column("uid")
tbl.remove_column("answers") # Per docs, this is deprecated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that remove_column errors if the column name isn't found. Is it possible that this column name will stop getting returned? Or do you mean it's deprecated in some other way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column is still being returned by Empower, but their docs indicate that it isn't being used for anything any longer. So, I decided to not surface it for the user.

@shaunagm
Copy link
Collaborator

shaunagm commented Dec 4, 2023

I added a couple of comments/questions @jburchard but nothing major. Should be pretty close to merging this.

@jburchard
Copy link
Collaborator Author

Thanks @shaunagm @sharinetmc for the feedback. I'll take another pass. There are also a few slight modifications (a few errors surfaced) when I started using a forked version of this, that I do want to include in the PR before it is merged.

Copy link
Collaborator

@shaunagm shaunagm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just formally requesting changes so it shows up with that status in the pull request list view :)

@shaunagm
Copy link
Collaborator

Hi Justin, zero pressure - just wanted to see if you have plans to get to this soon?

@KasiaHinkson
Copy link
Contributor

TMC has been using Airbyte for our Empower sync and it's no longer working so we're looking for an alternative. This looks like good work, can TMC do anything to help get this over the line?

Also, we were not doing any unnesting at all at the load step, so I'd like to include a function/option to just return the results as a 1 row table with all that nested json that we would write to GCS and then load into BQ, our dbt pipelines handle the rest.

@shaunagm
Copy link
Collaborator

shaunagm commented Dec 2, 2024

Merged in #1191 so closing this.

@shaunagm shaunagm closed this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new connector Work type - creating a new Parsons connector for a tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants