Scrapper prepares organisation.data.xml.csv
from publishers' organisation XML files and publishers.data.scrapping.csv
from publishers information from the IATI Registry.
For each organisation data, the script checks (see OrganisationCollection>checkAndUpdate
)
- whether the organisation-list part of the identifier is valid or not based on the org-id.guide
- whether the organisation identifier is present in IATI organisation codelist or not
- if the identifer already exists, then the metadata is updated if there's a change
- if the name already exists, it ignores that organisation and uses the initial identifier that has been saved
- else the data is added to the csv list for importing to the database
- source are in
src/cleanup
- Run
python initial_cleanup.py
to cleanup organisation data
It reads data/organisation.data.xml.csv and data/publishers.data.scrapping.csv and generates out/organisations-clean.csv
containing valid organisations information.
The organisations-clean.csv
is cleaned-up manually if needed.
- source are in
src/dump
- copy
config.py.bak
toconfig.py
- create postgres database and update
config.py
with credentials - Run
python dump.py
which readsorganisations-clean.csv
and dumps the data into the database you have just created