Skip to content

Project milestone Jun20

Zarquan edited this page Jun 23, 2020 · 19 revisions

This plan refers to the proposed architecture described in this GoogleDoc

Unless otherwise stated completing each step requires a machine readable pass/fail test to verify it works.

Milestone Jun20 - target end of June 2020

Project planning

Architecture design document

Service definitions, outline of functionality.

  • Overview of what each service does.
  • Outline of the REST interfaces.
  • Deliverable: A wiki page describing the services. (issue #129)

Milestone documents

Describe 6 milestones, one for each month.

  • Deliverable: A wiki page for each milestone. (issue #132)

User experience

What is the minimum viable product ?

  • Deliverable: A wiki page describing the MVP. (issue #130)

Evaluate GAVIP portal https://gavip.esac.esa.int/

  • Document the user experience from login to running a notebook.
  • How does a new user get access ?
  • What resources a user can request ?
  • What happens if the resources are not available ?
  • Can we run some of our example notebooks on this platform ?
  • Deliverable: A wiki page describing the GAVIP portal. (issue #135)

Evaluate LSP (inc. DASK panel)

  • Document the user experience from login to running a notebook.
  • How does a new user get access ?
  • What resources a user can request ?
  • What happens if the resources are not available ?
  • Can we run some of our example notebooks on this platform ?
  • Deliverable: A wiki page describing the LSP user experience. (issue #131)

Expand the user experience section in the design document

  • Document the user experience from login to running a notebook.
  • How does a new user get access ?
  • What resources a user can request ?
  • What happens if the resources are not available ?
  • Deliverable: Updated section in the design document. (issue #n)

Describe the interaction between our portal and Zeppelin.

  • Describe the user experience.
  • Outline the technical details of how we implement it.
  • Deliverable: A wiki page to describe the interaction. (issue #n)

Can we use Zeppelin components without a portal ?

  • What are the +ve and -ve for each ?
  • List the components we will need to provide the user experience we want.
  • How many of them are available in Zeppelin ?
  • How many of them would we need to modify or develop ourselves ?
  • How many before the cost/benefit favours a separate portal site ?
  • Deliverable: A wiki page to address this. (issue #n)

Authentication

How do we handle name/pass authentication ?

  • How do we handle name/pass authentication ?
  • How do new users register ?
  • Deliverable: A wiki page describing name/pass login and registration. (issue #n)

How do we handle OAuth authentication ?

  • How do we handle OAuth authentication ?
  • How do new users register ?
  • Deliverable: A wiki page describing OAuth login and registration (issue #n)

Can we use IRIS IAM as one of our OAuth service providers ?

How do we propagate identity between services ?

  • Can we use JSON Web Tokens https://jwt.io/
  • Can we make use of the work done by Sara et al at INAF ?
  • Deliverable: A wiki page describing inter-service tokens. (issue #n)

Authorisation

How do we declare what a user is allowed to access (policy)

  • Deliverable: A wiki page describing access policies (issue #n)

How do we implement the access rules (implementation)

  • Deliverable: A wiki page describing access control (issue #n)

Object templates

Example YAML|JSON templates for system objects

  1. User account
  2. Resource packages [tiny|small|medium|large]
  3. Zeppelin configuration
  4. Spark configuration
  5. Storage resources for a user
  6. A quick (tiny for a day) resource booking
  7. A longer (medium for weeks) resource booking
  • Deliverable: A set of YAML|JSON templates in git (issue #n)

Data storage

Shared instances of Gaia DR2 in csv and Parquet files

  1. Swift S3 share. (issue #137)
  2. Manila CephFS share. (issue #136)
  3. HDFS on Cinder volumes. (issue #62)
  4. NFS on Cinder volumes. (issue #62)
  • Deliverable: A single set of notes describing how to create the different datasets. (issue #n)
  • Deliverable: A set of performance tests for different file systems. (issue #138)
  • Deliverable: Automated process for downloading, building and publishing the data. (issue #n)

Spark API

Invoking Spark jobs via the Spark command line interface.

  • Is there a Python client for the Spark command line interface ?
  • Deliverable: A wiki page describing how we can use the Spark command line interface (issue #n)
    • For our own development work
    • For our own integration testing
    • Do we allow end users to access it ?
      • Probably not in this version

Evaluate Apache Livy https://livy.incubator.apache.org/

  • Does this help with what we need ?
  • How does this integrate with Zeppelin ?
  • Deliverable: A wiki page describing Apache Livy (issue #n)

Science examples

  • todo

Technical examples

  • todo

Project infrastructure

Continuous Integration

Prototype of a component running inside the Cambridge cloud

  1. 'hello world' test to create a server in Openstack
  2. 'hello world' test to run a task in Spark
  3. 'hello world' test to run a task in Zeppelin
  • PASS/FAIL status reporting for each test
  • Deliverable: Working code in git for the 'hello world' tests (issue #n)

Prototype gateway component from GitHub to the internal system

  • PASS/FAIL status reporting propagated back to GitHub.
  • Deliverable: Working code in git to launch the 'hello world' tests from GitHub. (issue #n)
  • Deliverable: Wiki page discussing the security issues when triggered by a pull request. (issue #n)

Terraform deployment

Initial experiments using StackHPC examples.

  • Should we use Terraform as part of our manual deployments ?
    • Probably not, issues with shared state.
  • Should we use Terraform as part of our automated deployments ?
    • Probably not, issues with shared state.
  • Deliverable: Notes on how to get the examples working (issue #58)

Autoscaling Spark/K8s/Magnum deployment

Initial experiments using StackHPC examples.

  • Demonstrate autoscaling in response to load.
  • Question - Do we clone the StackHPC code or do we use it as a template to roll our own ?
  • Deliverable: Notes on how to get the examples working (issue #58)

System services

What do we use to deploy our system services ?

  1. A Kubernetes cluster ?
  2. A Terraform+Ansible deployment ?
  3. A pure Ansible deployment ?
  4. A shell scripted deployment ?
  • Deliverable: A wiki page describing the options available. (issue #n)

Local Docker registry

Automated deployment for a local Docker registry.

  • Deliverable: Notes in git to deploy a local registry. (issue #n)
    • Using Ansible
    • Using Kubernetes

Secret stores

Evaluate options for handling secrets between infrastructure layers.

  • Openstack
  • Ansible
  • Terraform
  • Kubernetes
  • Docker
  • Webapps
  • Deliverable: A wiki page describing the options available (issue #n)