-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating data pipeline to limit fetch count and HBCUs priority list
- Loading branch information
Showing
6 changed files
with
51 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,12 +49,17 @@ You need a `.env` file to store secrets and other environment variables as follo | |
|
||
``` | ||
[email protected] | ||
INSTITUTION_FILTER=hbcus | ||
INSTITUTIONS_FETCH_FILTER=hbcus | ||
INSTITUTIONS_FETCH_COUNT=5 | ||
``` | ||
|
||
The OPENALEX_EMAIL secret is used to [speed up calls](https://docs.openalex.org/how-to-use-the-api/api-overview) to the OpenAlex REST API. | ||
|
||
The INSTITUTION_FILTER (allowed values = `hbcus` or `howardu`) is used to configure which institutions will be fetched from the OpenAlex API and saved to `observable/docs/data/institutions.json`. You will need to delete the existing `institutions.json` file from your local to ensure that a fresh API call is made. | ||
INSTITUTIONS_FETCH_FILTER (allowed values = `hbcus` or `howardu`) is used to configure which institutions will be fetched from the OpenAlex API and saved to `observable/docs/data/institutions.json`. | ||
|
||
INSTITUTIONS_FETCH_COUNT determines how many institutions will be loaded in the application. | ||
|
||
>**NOTE:** INSTITUTIONS_FETCH_FILTER and INSTITUTIONS_FETCH_COUNT are only used when running `fetch_custom_institutions.py` as a script. When using `invoke fetch` the default values of `hbcus` and `5` are used respectively. | ||
## Running | ||
|
||
|
@@ -75,7 +80,7 @@ Deployments to this project on the Observable Cloud take place through the **Dep | |
|
||
You can run various other commands using `invoke` as follows. | ||
|
||
Fetch HBCUs institutions data from the OpenAlex API and save it to `observable/docs/data/institutions.json`: | ||
Fetch first 5 HBCUs institutions data from the OpenAlex API and save it to `observable/docs/data/institutions.json`: | ||
|
||
```bash | ||
invoke fetch | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,20 @@ | ||
from pyalex import Institution, Institutions | ||
import json | ||
import os | ||
import scripts.fetch_custom_institutions as fetch_custom_institutions | ||
import sys | ||
|
||
|
||
def get_institutions(institutions_file_path: str = "observable/docs/data/institutions.json") -> list[Institution]: | ||
def get_institutions(institutions_file_path: str = "docs/data/institutions.json") -> list[Institution]: | ||
institutions = [] | ||
|
||
# Load institutions from JSON file | ||
try: | ||
institutions = json.load(open(institutions_file_path)) | ||
except Exception as e: | ||
print("\nError loading institutions from JSON file", institutions_file_path, ":", e, "\n") | ||
|
||
# Fetch institutions from API if JSON file is empty or not found | ||
try: | ||
if institutions is None or len(institutions) == 0: | ||
print("No institutions found in JSON file, attempting to fetch from the API\n") | ||
institutions = fetch_custom_institutions.fetch_institutions_from_api(os.getenv("INSTITUTION_FILTER")) | ||
except Exception as e: | ||
print("\nError fetching institutions from the API:", e, "\n") | ||
print("\nError loading institutions from JSON file", institutions_file_path, ":", e, "\n", file=sys.stderr) | ||
|
||
# Get 5 random institutions in case of error | ||
if institutions is None or len(institutions) == 0: | ||
print("No institutions found in JSON file or fetched from the API, fetching random institutions\n") | ||
print("No institutions found in JSON file, fetching random institutions\n", file=sys.stderr) | ||
institutions = [Institutions().random() for _ in range(5)] | ||
|
||
return institutions | ||
|
||
|
||
if __name__ == "__main__": | ||
institutions = get_institutions() | ||
print("Loaded", len(institutions), "institutions\n") |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters