Project milestone Jun20

This plan refers to the proposed architecture described in this GoogleDoc

Unless otherwise stated completing each step requires a machine readable pass/fail test to verify it works.

Milestone Jun20 - target end of June 2020

Project planning

Architecture design document

Service definitions, outline of functionality.

Overview of what each service does.
Outline of the REST interfaces.
Deliverable: A wiki page describing the services. (issue #129)

Milestone documents

Describe 6 milestones, one for each month.

Deliverable: A wiki page for each milestone. (issue #132)

User experience

What is the minimum viable product ?

Deliverable: A wiki page describing the MVP. (issue #130)

Evaluate GAVIP portal https://gavip.esac.esa.int/

Document the user experience from login to running a notebook.
How does a new user get access ?
What resources a user can request ?
What happens if the resources are not available ?
Can we run some of our example notebooks on this platform ?
Deliverable: A wiki page describing the GAVIP portal. (issue #135)

Evaluate LSP (inc. DASK panel)

Document the user experience from login to running a notebook.
How does a new user get access ?
What resources a user can request ?
What happens if the resources are not available ?
Can we run some of our example notebooks on this platform ?
Deliverable: A wiki page describing the LSP user experience. (issue #131)

Expand the user experience section in the design document

Document the user experience from login to running a notebook.
How does a new user get access ?
What resources a user can request ?
What happens if the resources are not available ?
Deliverable: Updated section in the design document. (issue #n)

Describe the interaction between our portal and Zeppelin.

Describe the user experience.
Outline the technical details of how we implement it.
Deliverable: A wiki page to describe the interaction. (issue #n)

Can we use Zeppelin components without a portal ?

What are the +ve and -ve for each ?
List the components we will need to provide the user experience we want.
How many of them are available in Zeppelin ?
How many of them would we need to modify or develop ourselves ?
How many before the cost/benefit favours a separate portal site ?
Deliverable: A wiki page to address this. (issue #n)

Authentication

How do we handle name/pass authentication ?

How do we handle name/pass authentication ?
How do new users register ?
Deliverable: A wiki page describing name/pass login and registration. (issue #n)

How do we handle OAuth authentication ?

How do we handle OAuth authentication ?
How do new users register ?
Deliverable: A wiki page describing OAuth login and registration (issue #n)

Can we use IRIS IAM as one of our OAuth service providers ?

https://indigo-iam.github.io/docs/v/current/user-guide
Deliverable: A wiki page looking at IRIS IAM. (issue #n)

How do we propagate identity between services ?

Can we use JSON Web Tokens https://jwt.io/
Can we make use of the work done by Sara et al at INAF ?
Deliverable: A wiki page describing inter-service tokens. (issue #n)

Authorisation

How do we declare what a user is allowed to access (policy)

Deliverable: A wiki page describing access policies (issue #n)

How do we implement the access rules (implementation)

Deliverable: A wiki page describing access control (issue #n)

Object templates

Example YAML|JSON templates for system objects

User account
Resource packages [tiny|small|medium|large]
Zeppelin configuration
Spark configuration
Storage resources for a user
A quick (tiny for a day) resource booking
A longer (medium for weeks) resource booking

Deliverable: A set of YAML|JSON templates in git (issue #n)

Data storage

Shared instances of Gaia DR2 in csv and Parquet files

Swift S3 share. (issue #137)
Manila CephFS share. (issue #136)
HDFS on Cinder volumes. (issue #62)
NFS on Cinder volumes. (issue #62)

Deliverable: A single set of notes describing how to create the different datasets. (issue #n)
Deliverable: A set of performance tests for different file systems. (issue #138)
Deliverable: Automated process for downloading, building and publishing the data. (issue #n)

Spark API

Invoking Spark jobs via the Spark command line interface.

Is there a Python client for the Spark command line interface ?
Deliverable: A wiki page describing how we can use the Spark command line interface (issue #n)
- For our own development work
- For our own integration testing
- Do we allow end users to access it ?
  - Probably not in this version

Evaluate Apache Livy https://livy.incubator.apache.org/

Does this help with what we need ?
How does this integrate with Zeppelin ?
Deliverable: A wiki page describing Apache Livy (issue #n)

Science examples

todo

Technical examples

todo

Project infrastructure

Continuous Integration

Prototype of a component running inside the Cambridge cloud

'hello world' test to create a server in Openstack
'hello world' test to run a task in Spark
'hello world' test to run a task in Zeppelin

PASS/FAIL status reporting for each test
Deliverable: Working code in git for the 'hello world' tests (issue #n)

Prototype gateway component from GitHub to the internal system

PASS/FAIL status reporting propagated back to GitHub.
Deliverable: Working code in git to launch the 'hello world' tests from GitHub. (issue #n)
Deliverable: Wiki page discussing the security issues when triggered by a pull request. (issue #n)

Terraform deployment

Initial experiments using StackHPC examples.

Should we use Terraform as part of our manual deployments ?
- Probably not, issues with shared state.
Should we use Terraform as part of our automated deployments ?
- Probably not, issues with shared state.
Deliverable: Notes on how to get the examples working (issue #58)

Autoscaling Spark/K8s/Magnum deployment

Initial experiments using StackHPC examples.

Demonstrate autoscaling in response to load.
Question - Do we clone the StackHPC code or do we use it as a template to roll our own ?
Deliverable: Notes on how to get the examples working (issue #58)

System services

What do we use to deploy our system services ?

A Kubernetes cluster ?
A Terraform+Ansible deployment ?
A pure Ansible deployment ?
A shell scripted deployment ?

Deliverable: A wiki page describing the options available. (issue #n)

Local Docker registry

Automated deployment for a local Docker registry.

Deliverable: Notes in git to deploy a local registry. (issue #n)
- Using Ansible
- Using Kubernetes

Secret stores

Evaluate options for handling secrets between infrastructure layers.

Openstack
Ansible
Terraform
Kubernetes
Docker
Webapps
Deliverable: A wiki page describing the options available (issue #n)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly