Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring API data pipeline for fetching institutions data and updating Readme #18

Merged
merged 3 commits into from
Apr 25, 2024

Conversation

whymath
Copy link
Collaborator

@whymath whymath commented Apr 24, 2024

No description provided.

Copy link
Contributor

@kaaloo kaaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic job @whymath ! Thank you so much for all the updates including the improvements to the README. There are a few issues to attend to but I ran both the script and the data loader. I'm testing the 85 institutions now, I expect we'll have to select a few when developing because of the time it takes to fetch the data when running invoke dev. As for displaying the data on the graph it could definitely be challenging if it even works. What would you suggest? Maybe we can limit to 5 institutions for now.

collabnext/openalex/institutions.py Outdated Show resolved Hide resolved
collabnext/openalex/institutions.py Outdated Show resolved Hide resolved
collabnext/openalex/institutions.py Outdated Show resolved Hide resolved
collabnext/openalex/institutions.py Outdated Show resolved Hide resolved
collabnext/openalex/institutions.py Outdated Show resolved Hide resolved
@whymath whymath requested a review from kaaloo April 25, 2024 04:02
@whymath
Copy link
Collaborator Author

whymath commented Apr 25, 2024

Makes sense @kaaloo , I have added a new parameter to the script and function so that the number of institutions fetched and loaded is now configurable, with the default value set to 5 in the invoke fetch task.

I have also moved the 5 institutions recommended by @lewlefton to the top of the hbcus_names_list.csv so they will always be fetched and loaded first. @lewlefton note that I had to update the name of "Alabama A&M University" to "Alabama Agricultural and Mechanical University" since that is how the data is present in OpenAlex.

Copy link
Contributor

@kaaloo kaaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @whymath! Looks great!

@kaaloo
Copy link
Contributor

kaaloo commented Apr 25, 2024

@whymath I'll go ahead and merge your PR since you will only be around a bit later in the day. Great job!

@kaaloo kaaloo merged commit dbea5f6 into main Apr 25, 2024
@kaaloo kaaloo deleted the fetch-custom-institutions branch April 25, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants