Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added architecture-patterns.md #332

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions blueprints.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ Where possible this will be a set of fully working components / solutions you ca
| [Automating performance-test decisions using APDEX](practices/performance-testing.md) | Instructions | Recommended | Published |
| [Scanning source code for secrets](tools/nhsd-git-secrets/README.md) | Full solution | Recommended | Published |
| Cross-account backups on AWS | Instructions | In progress | Draft |
| [Architecture Patterns](practices/guides/architecture-patterns.md) | Instructions | In progress | Draft |
59 changes: 59 additions & 0 deletions practices/guides/architecture-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Architecture patterns

This is an attempt to formalise decision process of how our services are built, specifically what Cloud Hosting Provider technologies are considered appropriate for different scenarios. The basis for this comes from the pattern of [Outsource bottom up]((patterns/outsource-bottom-up.md)).

Wherever possible we transfer the responsibility for management of infrastructure services from Customer (NHS England) to Cloud Hosting provider. This allows us to focus on solving the specific problems we need to solve rather than managing underlying infrastructure.

## Compute Selection

### Rule 1 Your Service MUST be built using __Function as a Service__ technologies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Rule 1 Your Service MUST be built using __Function as a Service__ technologies
### Rule 1 Your Service SHOULD be built using __Function as a Service__ technologies

See below

This is the default technology for building services using Cloud Hosting Services. The rationale for this is that it:

* Reduces the overhead of infrastructure management to an absolute minimum.
* Prevents the development of Monolithic applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimistic take. It doesn't stop you from building a distributed monolith, and if you already can't modularise well, it makes fixing boundary problems harder.

* Has the ability to automatically scale to Zero, with fast start-up / scale-up times.
* Encourages a reduction in the size of deployable components.

TimCoates marked this conversation as resolved.
Show resolved Hide resolved
Functions SHOULD be delivered as individual capabilities, rather than one function routing traffic internally to different capabilities (see for example this AWS anti-pattern).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a link? Also I disagree that this is clear-cut enough to assert as a SHOULD. What's the reasoning for us here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: https://docs.aws.amazon.com/lambda/latest/operatorguide/anti-patterns.html lists several drawbacks of Lambda Monoliths. It also indicates a misunderstanding of what FaaS 'is', i.e. it's just another runtime thatI can run my code on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that there are major holes in that document. All the listed factors other than the granularity of the security boundary make strong assumptions about traffic patterns and deployment workflows which just aren't going to be universally true or relevant. For instance, yes, you do need to have a way to avoid having developers tread on each others' feet for maintenance to be tractable. Is forcing a function-based architecture on the system the right way to do that? Does a function-based architecture even give you that, necessarily? No, not in my experience, unless you couple it to other changes as a consequence, which we may not want to buy into.

That's just one obvious gap. You can make similar arguments about package size, upgrades, code reuse, and especially about testing.

I think if we're going to have a strong position on this one, we need to understand why it's correct for us to be doing it, with reasoning that applies to the specifics of how we develop code, rather than appealing to the authority which we pay per function call.


Our preferred managed services are:

* [Azure Function Apps](https://learn.microsoft.com/en-us/azure/architecture/serverless-quest/reference-architectures)
* [AWS Lambdas](https://docs.aws.amazon.com/whitepapers/latest/serverless-multi-tier-architectures-api-gateway-lambda/sample-architecture-patterns.html)


The following criteria permit __Rule 1__ to be broken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've got criteria for breaking it, that makes it a SHOULD, not a MUST, as above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that "SHOULD xxx unless yyy" makes the SHOULD clause strong enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, that's the logic here. "Do this, no exceptions" is a MUST. "Do this... unless..." is a SHOULD. I don't think we call out RFC 2119 anywhere here, but that would be the interpretation I would expect readers to have.


* The execution time of calls to your Service are likely to exceed that possible for a Function service
* NB: This could suggest that your service could be refactored to reduce batch processing sizes?
* The data payload size required for operating your Service is beyond the limits of FaaS offerings
* NB: This could suggest that your service could be refactored to reduce batch processing sizes?
* Your Service relies on other proprietary software being installed on the same OS that your code runs on
TimCoates marked this conversation as resolved.
Show resolved Hide resolved

### Rule 2 Where Rule 1 can NOT be met, your Service MUST be built using __Managed Service Container__ technologies

Where it is not possible to build your service using FaaS, you MUST build it using one or more Containers. The rationale for this is that it:

* Reduces the overhead of infrastructure management.
* Has the ability to automatically scale up and down, with relatively fast start-up / scale-up times.
* Is portable, so can (to some extent) be run on different hosts.
* Encourages a reduction in the size of deployable components.

Our preferred managed services are:

* [AWS ECS](https://aws.amazon.com/ecs/)
* Only where ECS can NOT be used, [EKS](https://aws.amazon.com/eks/) should be considered, this is because of the shift of management responsibility from Customer to Cloud Hosting provider that ECS offers over EKS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to move EKS to CONTAIN for this, and just point to the radar and the Engineering Board to ratify a choice to use it (or update the ring)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to fold in how these relate to the Tech Radar then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of a question mark here. We've said (internally, no reason for you to have knowledge) that we don't want links to internal sharepoint docs on our externally-published docs. Pointing to the tech radar would imply pointing to the internal docs which detail the tech radar process. I suspect that the right thing to do is to remove this "preferred services" section entirely.

* [Azure Container Apps](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-intelligent-apps-and-microservices-with-azure-container/ba-p/3982588)
* Only where Azure Container Apps can NOT be used, [Azure AKS](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-microservices/aks-microservices) should be considered, this is because of the shift of management responsibility from Customer to Cloud Hosting provider that Azure Container Apps offers over Azure AKS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should be a move of AKS to CONTAIN, really.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for EKS


The following criteria permit __Rule 2__ to be broken

Your Service relies on other proprietary software being installed on the same OS that your code runs on
TimCoates marked this conversation as resolved.
Show resolved Hide resolved

### Rule 3 - Where Rules 1 and 2 can NOT be met, your Service MUST be built on Cloud hosted Virtual Machines

Where your Service needs to be built on Virtual Machine Instances, then the following apply:

* Your Virtual Machines MUST be based on commodity images.

Only where Rules 1, 2 and 3 can NOT be met, your Service may be built using physical on premise hardware.
TimCoates marked this conversation as resolved.
Show resolved Hide resolved
Loading