Skip to content

Latest commit

 

History

History
73 lines (43 loc) · 1.72 KB

README.md

File metadata and controls

73 lines (43 loc) · 1.72 KB

NewsdataIO News Gatherer

Uses the NewsData API to request news content and push results to a specified Google BigQuery database.

For more details, go to: https://newsdata.io/documentation

What you will need

  • Python 3.10 or newer
  • A Google BigQuery service account
  • A NewsDataIO API key

How to use this tool

1. Set your environment variables.

In Pycharm, this can be done by going to Run -> Edit Configurations -> Environment Variables

img_1.png

You will need:

Name Value
gbq_servicekey /path/to/your/credentials.json
newsdata_apikey your newsdataio api key (value)

Ensure that the name of your credentials are exactly as above, and that your path and key values are correct for your specific setup:

img.png


2. Install requirements

pip install -r requirements.txt


3. Enter your config details in config.yml. An example of a valid configuration is shown below:

# SearchParams:

endpoint: 'archive'     # 'archive' or 'news'
domains: ['7news', 'skynewsau', 'sbs', ..., 'smh', 'thewest', 'theage', 'couriermail']

query: '*'
date_from: '2024-03-01'
date_to: '2024-03-03'

country: 'au'       # Examples: au=Australia, de=Germany
language: 'en'      # Examples: en=English/de=german

# Google BigQuery Params
project_name: 'your-gbqproject'
dataset_name: 'newsdataio_data'
tablename: 'newsdata_news_au'

4. Run run_newsio_gather.py

This will call collector.py to gather news articles from the NewsDataIO API and push the results to Google BigQuery.

Data output

TODO