-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspine2_erlang.html
286 lines (259 loc) · 15.2 KB
/
spine2_erlang.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Spine II - An Overview</title>
<meta name="description" content="A walk-through of how The Big NHS Computer was replaced">
<meta name="author" content="Martin Sumner">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="css/reveal.css">
<link rel="stylesheet" href="css/theme/black.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<h2>The Big NHS Computer</h2>
<h3>Erlang Edition</h3>
<p>The Prime Minister said ..</p>
<blockquote cite="http://www.independent.co.uk/voices/commentators/oliver-wright-the-potential-was-huge-but-so-were-the-problems-2330925.html">
“The possibilities are enormous if we can get this right”
</blockquote>
</section>
<section>
<h2>... and then for the next 15 years ...</h2>
<p align="left">Talk through a programme gone bad: 2003 - 2013</p>
<p align="left">Focus on the technology problems</p>
<p align="left">Look at the role played by Erlang in the rescue: 2011 - 2017</p>
<p align="left">Consider what this does and doesn't mean for the future</p>
<p align="left"><small>http://martinsumner.github.io/presentations/spine2_erlang.html#/</small></p>
<p><small>@masleeds</small></p>
</section>
<section>
<p>See Wikipedia </p>
<p><a href="https://en.wikipedia.org/wiki/List_of_failed_and_overbudget_custom_software_projects#Permanent_failures" target="_blank">List of failed and overbudget custom software projects - Permanent Failures</a>
</p>
<img width="1200" height="200" data-src="images/wikipedia.png" alt="Wikipedia Screenshot"/>
</section>
<section>
<h2>The Spine Part - The supplier speaks ...</h2>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“It has made transformational healthcare applications available to approximately 1.3 million NHS healthcare staff across England, providing care to circa 50 million UK citizens.”
</blockquote>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“20-plus customised NHS Spine applications ... combined cutting edge technologies to meet the demanding service level agreements and response times required ”
</blockquote>
</section>
<section>
<h2>More of their own words</h2>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“The contract was (and continues to be) one of the largest IT programmes in the world, consuming over 15,000 man-years of effort to date ... Over 3,000 servers are hosted and supported”
</blockquote>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“(The delivery) methodology is now an internationally recognised standard for complex software development programme delivery”
</blockquote>
</section>
<section>
<p>What did we build again?</p>
<img width="500" height="500" data-src="images/first-death-star.png" alt="The Death Star">
</section>
<section>
<h2>What does this kind of success look like?</h2>
<p align="left">Around <strong>50%</strong> of the original business case met</p>
<p align="left">The system is <strong>stable</strong> when <strong>untouched</strong></p>
<p align="left">... and this makes it a success</p>
<p align="left">Spine can release with <strong>£30m</strong> in transition costs alone</p>
<p align="left">It costs over <strong>£50m</strong> per annum to keep the lights on</p>
<p align="left">... and most people still think of it as a success</p>
</section>
<section>
<img width="700" height="500" data-src="images/stormtrooper-despair.jpg" alt="Stormtrooper Despair">
</section>
<section>
<h2>The people problem</h2>
<p align="left">Parkinsons' Law and the generation of work</p>
<p align="left">Conway's Law and the dominance of contractual boundaries</p>
<p align="left">Pournelle's Iron Law of Bureaucracy</p>
<p align="left">Brooks's Law</p>
<p align="left">Anchoring</p>
</section>
<section>
<h2>The technology problem</h2>
</section>
<section>
<h2>The hunt for evidence of slowness</h2>
<p align="left">Linear expectations</p>
<p align="left">Infrastructure chosen through fear of latency not complexity</p>
<p align="left">Scaling often reversed due to latency</p>
<p align="left">Change costs on hitting limits</p>
</section>
<section>
<h2>The maginot line - strength is easier to see than weakness</h2>
<p align="left">Security theatre</p>
<p align="left">Data is in the system, not just the database</p>
<p align="left">Races between redundancy protocols</p>
</section>
<section>
<h2>Every problem looks like a network problem</h2>
<p align="left">Distribution compounded by complexity, variety and contracts</p>
<p align="left">Humans anchored to the literal reading of logs</p>
<p align="left">Network state management - timeouts, nagle, pools</p>
</section>
<section data-background="#A3C2FF">
<blockquote cite="https://github.com/GovernmentCommunicationsHeadquarters/BoilingFrogs/blob/master/GCHQ_Boiling_Frogs.pdf">
“They shouldn't build these death stars any more. They keep getting blown up”
</blockquote>
<img width="500" height="500" data-src="images/death-star-2.jpg" alt="The Death Star">
</section>
<section data-background="#A3C2FF">
<h2>Planning the technology solution</h2>
<p align="left">Looked sideways. Looked backwards</p>
<p align="left">Erlang gave vision of availability and software-driven scale</p>
<p align="left">The power of small unified teams with a common goal</p>
<img width="600" height="300" data-src="images/PlanningMeeting.jpeg" alt="Planning Meeting">
</section>
<section data-background="#A3C2FF">
<h2>The Actor Model</h2>
<p align="left">Used Erlang products as building blocks - esp Riak</p>
<p align="left">Used RabbitMQ (and Tornado) used a means to support Python ...</p>
<p align="right">... with async message passing between actors</p>
<p align="right">... with generalised behaviours</p>
<p align="right">... and a small numbers of common paths</p>
</section>
<section data-background="#A3C2FF">
<h2>Scaling</h2>
<p align="left">Avoided logical bottlenecks</p>
<p align="left">Cast and callbacks - unless we can absorb back-pressure</p>
</section>
<section data-background="#A3C2FF">
<h2>Failure</h2>
<p align="left">Handle failure by processing elsewhere</p>
<p align="left">No triage to determine operational process</p>
<p align="left">Slow triage to determine cause</p>
<p align="left">Automate failover globally by deep-ping of path</p>
</section>
<section data-background="#A3C2FF">
<h2>Network hops</h2>
<p align="left">Standardise protocols - AMQP/HTTP</p>
<p align="left">Operational visibility of network hops ...</p>
<p align="right">... Biggest speed-up is visibility</p>
</section>
<section data-background="#A3C2FF">
<h2>Automate</h2>
<p align="left">Remove human hands - other than to pause/reflect</p>
<p align="left">Don't allow automation to excuse complexity</p>
<p align="left">Security benefits of disconnection</p>
<p align="left">Make rehearsal constant and natural</p>
</section>
<section data-background="#A3C2FF">
<h2>Discipline</h2>
<p align="left">No Silver Bullet - all be great designers</p>
<p align="left">Logs as important as tests</p>
<p align="left">Enforce opportunities to work from logs</p>
<p align="left">Reason end-to-end and test end-to-end - invert the pyramid</p>
</section>
<section data-background="#E6E68A">
<h2>What did/does it cost?</h2>
<p align="left">Took <strong>100 people years</strong> from inception to 1-years service</p>
<p align="left">Requires just over <strong>100 commodity 1RU servers</strong> in live</p>
<p align="left">Release costs are <strong>< 0.1%</strong> of previous release costs</p>
<p align="left"><strong>90%</strong> reduction in operating costs</p>
<p align="left">Total running team of <strong>30</strong> people supporting and ...</p>
<p align="left">... Managing more than <strong>£10m</strong> pa of change backlog</p>
<p align="left">Adding the same <strong>slow</strong> node resolves any capacity issue</p>
</section>
<section data-background="#E6E68A">
<h2>Does it work?</h2>
<p align="left">(Nearly) like-for-like functional replacement ...</p>
<p align="left"><strong>99.999%</strong> available since go live</p>
<p align="left">Supports over <strong>300</strong> message interactions, eight UI applications</p>
<p align="left"><strong>45M</strong> messages a day<p>
<p align="left">Provides accesss to <strong>1.5bn</strong> records and documents</p>
<p align="left">Aggregate reduction in wait time is over <strong>800 working days</strong> each day</p>
</section>
<section data-background="#E6E68A">
<h2>Positive Erlang Lessons</h2>
<p align="left">It led us to a new way of thinking about failure ...</p>
<p align="left"> ... and about the boundary between network and application</p>
</section>
<section data-background="#E6E68A">
<h2>Positive Erlang Lessons</h2>
<p align="left">Asyncronous message passing had a deep impact ...</p>
<p align="left"> ... With standardised paths and behaviours</p>
</section>
<section data-background="#E6E68A">
<h2>Positive Erlang Lessons</h2>
<p align="left">Per-process overheads and mistakes ...</p>
<p align="left"> ... Regret not embracing it more deeply</p>
</section>
<section data-background="#E6E68A">
<h2>Positive Erlang Lessons</h2>
<p align="left">It brought us closer to computer science ...</p>
<p align="left"> ... and pushed us away from vendors</p>
</section>
<section data-background="#E6E68A">
<h2>Woot! So government will learn from this right ....</h2>
</section>
<section data-background="#E6E68A">
<blockquote cite="https://martinfowler.com/articles/microservices.html">
“Be of the web not behind the web”
</blockquote>
<blockquote cite="https://www.theregister.co.uk/2012/04/02/gov_it_contract_cap/">
“Blighty slaps £100m spending cap on govt IT projects”
</blockquote>
</section>
<section data-background="#E6E68A">
<h2>What if every 3-page webapp .... </h2>
<img width="700" height="500" data-src="images/DeathStarArchitectureDiagrams.jpg" alt="Death Star Architecture Diagrams">
</section>
<section data-background="#E6E68A">
<h2>Thousands of small projects - all of them poor value</h2>
<img width="1000" height="500" data-src="images/GalacticSenate.png" alt="Galactic Senate">
</section>
<section data-background="#E6E68A">
<h2>Thank-You</h2>
<p>http://martinsumner.github.io/presentations/spine2_erlang.html#/</p>
<p>@masleeds</p>
</section>
</div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Optional reveal.js plugins
dependencies: [
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: 'plugin/zoom-js/zoom.js', async: true },
{ src: 'plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>