Uses the NewsData API to request news content and push results to a specified Google BigQuery database.
For more details, go to: https://newsdata.io/documentation
- Python 3.10 or newer
- A Google BigQuery service account
- A NewsDataIO API key
In Pycharm, this can be done by going to Run
-> Edit Configurations
-> Environment Variables
You will need:
Name | Value |
---|---|
gbq_servicekey | /path/to/your/credentials.json |
newsdata_apikey | your newsdataio api key (value) |
Ensure that the name of your credentials are exactly as above, and that your path and key values are correct for your specific setup:
pip install -r requirements.txt
# SearchParams:
endpoint: 'archive' # 'archive' or 'news'
domains: ['7news', 'skynewsau', 'sbs', ..., 'smh', 'thewest', 'theage', 'couriermail']
query: '*'
date_from: '2024-03-01'
date_to: '2024-03-03'
country: 'au' # Examples: au=Australia, de=Germany
language: 'en' # Examples: en=English/de=german
# Google BigQuery Params
project_name: 'your-gbqproject'
dataset_name: 'newsdataio_data'
tablename: 'newsdata_news_au'
This will call collector.py
to gather news articles from the NewsDataIO API and push the results to Google BigQuery.
TODO