Skip to content

Catalog CKAN 2.8 upgrade

James Brown edited this page Mar 16, 2022 · 13 revisions

This page is no longer up to date. Please see #1461.

This page describes our ongoing notes and planning around upgrading Catalog to CKAN 2.8.

Background

Moving to CKAN2.8 for catalog.data.gov (from CKAN2.3) will be a large jump, with many changes to the code base and extensions. Testing and development will need to be done before a review for staging and production can be completed.

Goals

To create a working version of the CKAN2.8 application for catalog.data.gov, with the necessary extensions installed and updated.

Assumptions

  • Start from a fresh database state (PostgreSQL, Solr, etc.)
  • Populate datasets by first importing existing harvest sources and then running the harvesters
  • Local testing and development will be done in catalog-app via Docker for the various libraries and CKAN extensions
  • Configuration will have to move to catalog-app for proper version control
  • data.gov master will still be used for deploying to production

Upgrade Development

Using Docker, a new working version of CKAN 2.8 will be installed, along with the necessary modules (SOLR, PostgreSQL, etc). The architecture of development will not match the current data.gov architecture, with merging into the current architecture to happen later.

The CKAN 2.8 will be the latest CKAN 2.8, currently here. All extensions of CKAN will use the original open source version (where applicable), for future maintenance. The current list of code bases are (the GSA versions, to be replaced by original upstream versions where applicable).

CKAN Extensions

To accomplish this extension testing, we will create a base CKAN 2.8 dockerfile. Then we will create separate layers for each extension, so they can be tested in combination or separately easily.

Staging Dev Name CKAN Comp Upstream Testing Notes
ckanext-archiver 2.1+ https://github.com/ckan/ckanext-archiver https://github.com/ckan/ckanext-archiver#testing
ckanext-datagovtheme any N/A None
ckanext-datajson https://github.com/okfn/ckanext-datajson, or https://github.com/ViderumGlobal/ckanext-datajson https://github.com/ViderumGlobal/ckanext-datajson/tree/datagov/ckanext/datajson/tests (exists, but no documentation)
ckanext-extlink N/A None
ckanext-geodatagov N/A https://github.com/GSA/ckanext-geodatagov/tree/master/tests (very incomplete coverage, possibly sample only?)
ckanext-googleanalyticsbasic N/A None
ckanext-harvest 2.x https://github.com/ckan/ckanext-harvest https://github.com/ckan/ckanext-harvest#tests Has some branches which may contain fixes.
ckanext-saml2 https://github.com/okfn/ckanext-saml2 None The fork (maybe upstream) has a dependency on repoze.who==1.0.18 which conflicts with ckan (repoze.who==2.0)
ckanext-qa https://github.com/ckan/ckanext-qa https://github.com/ckan/ckanext-qa#tests
ckanext-report https://github.com/datagovuk/ckanext-report None
ckanext-spatial https://github.com/ckan/ckanext-spatial https://github.com/ckan/ckanext-spatial/tree/master/ckanext/spatial/tests (No documentation, seems robust)
ckanext-ga-report https://github.com/datagovuk/ckanext-ga-report https://github.com/datagovuk/ckanext-ga-report/tree/master/ckanext/ga_report/tests (No documentation, limited scope)
PyZ3950 N/A https://github.com/GSA/PyZ3950/tree/master/test (No documentation, very limited in scope)

The new version will also implement the newer/necessary upgrades to SOLR and PostgreSQL per latest CKAN install best practices.

Level of effort

This is documented elsewhere with a summary below

Epics Total hours Developer hours QA hours Deadlines
A - BETA: Docker Environment 22 20 2 12/6/2018
B - BETA: CKAN2.8 Base 17.6 16 1.6 12/10/2018
C - BETA: CKAN2.8 Tests 13.2 12 1.2 12/12/2018
D - BETA: CKAN2.8 Extensions 237.6 216 21.6 1/28/2019
E - BETA: CKAN2.8 Tests 17.6 16 1.6 1/30/2019
F - BETA: CKAN2.8 Deployment (if time) 79.2 72 7.2 2/11/2019
Grand Total 387.2 352 35.2 2/11/2019

Task scope can do all tasks A-E, and may have time to additionally support some deployment activities (F).

Rollout plan for production (rough outline)

The working version of the catalog-app development environment will be saved via pip-freeze. We can then utilize this file to build the CKAN2.8 app on staging and development.

  1. Create new instances of RDS and Solr hosts
  2. Create new instances of web and harvester that can be used to populate and view the CKAN 2.8 application.
  3. Populate the CKAN 2.8 instance with existing harvest sources
  4. Run the harvesters to populate the instance
  5. Verify the CKAN 2.8 instance is working correctly
  6. Scale out CKAN 2.8 instance for production scale
  7. Cut over the ELB/NetScaler apps to point to CKAN 2.8
  8. Remove old CKAN instances (web, harvesters, solr, RDS)
Clone this wiki locally