-
Notifications
You must be signed in to change notification settings - Fork 0
/
presentation.qmd
734 lines (645 loc) · 21 KB
/
presentation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
---
title: "An AI tool for Dementia"
#subtitle: "It's containers all the way down"
author: "Simon Clifford"
institute: "University of Cambridge"
date: "2024-10-24"
date-format: "Do MMMM YYYY"
format:
revealjs:
slide-number: true
chalkboard:
buttons: false
preview-links: auto
logo: "images/camb_logo_transparent.svg"
css: styles.css
theme: [sky,mytheme.scss]
mermaid-format: svg
title-slide-attributes:
data-background-image: images/intended_layout.svg
data-background-size: "60%"
data-background-repeat: repeat
data-background-opacity: "0.5"
---
## What is AI Dementia?
::: {.notes}
The algorithm:
1. Zoe's group is in the Adaptive Brain Lab, Cambridge.
1. Takes a 3D brain scan (a magnetic resonance image, MRI).
2. Processes it to reduce it to only showing grey matter.
3. Obtains some cognitive scores, usually from a database.
4. Runs the fairly simply Python code to reduce all this to a single score.
1. Uses a Generalised Metric Learning Vector Quantization, GMLVQ.
:::
The method^1,2^ developed by Zoe Kourtzi's group
uses a machine learning (ML) algorithm trained on brain images (MRIs) and cognitive
scores to give a prognostic score of the likelihood of the patient having Alzheimer's.
::: footer
1: https://doi.org/10.1016/j.nicl.2020.102199
2: https://doi.org/10.1101/2020.08.15.252601
:::
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Brain MRI] --> B[Gray Matter]
B --> C[ML Model]
D[Cognitive Scores] --> C
```
## The XNAT prototype
::: {.notes}
Inherited the project from an earlier incarnation developed by Piero and Matt.
XNAT is a research platform for importing and processing image data. However, it's not generally deployed in clinical settings.
:::
![XNAT prototype, Coronica & Archer](images/Compute-output.jpeg)
## Structure of the prototype
::: {.notes}
The code has various scripts that fire on particular XNAT triggers to perform the
algorithm.
REDCap is a "secure web application for building online surveys and databases".
It provides an API for recovering previously entered data. It's commmonly used
in clinical research settings.
The Docker container is where the ML code is.
But Stretch is no longer supported, SPM12 was last updated in Jan 2020, and
Matlab 2010a... Alas, it turns out we don't have the Matlab source code, only
compiled `.mat` files, so we are stuck until our collaborators come up with the
source.
:::
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[XNAT server] --> B[Docker container]
A -.-> C[(REDCap server)]
```
The Docker container holds a
[Debian Stretch^2^ image]{.fragment .highlight-red} with [SPM12^3^]{.fragment
.highlight-red} and a [Matlab 2010a]{.fragment .highlight-red} runtime. SPM is
a Matlab library for analysing brain images.
::: footer
1: https://projectredcap.org/
2: Debian Stretch June 2017 to June 2022, https://www.debian.org/releases/stretch/
3: https://www.fil.ion.ucl.ac.uk/spm/software/spm12/
:::
## Docker^1^
::: notes
1. OK, going to be talking about Docker a lot, so what is it?
1. Processes run in containers, cannot interact with processes, filesystems, or
networks in other containers or on the host unless explicitly permitted.
1. Containers are built to be fully configured and ready to run.
:::
::: {.incremental}
- Docker: platform for building, shipping, and running applications.
- _Containers_: more lightweight that virtualisation.
:::
::: {.fragment .fade-in}
```Dockerfile
FROM python:3.8-slim-bullseye
WORKDIR /
COPY provisioning-base-os.sh /opt
RUN /opt/provisioning-base-os.sh
COPY start-ai_dementia_app.sh /opt
COPY provisioning-ai_dementia_app.sh /opt
COPY ai_dementia_app.tgz /opt
RUN /opt/provisioning-ai_dementia_app.sh
RUN rm /opt/ai_dementia_app.tgz /opt/provisioning-ai_dementia_app.sh /opt/provisioning-base-os.sh
WORKDIR /opt/ai_dementia_web
ENTRYPOINT ["/opt/start-ai_dementia_app.sh"]
```
:::
::: footer
1: https://www.docker.com/
:::
## Images
::: notes
1. Prototype was handed images
1. We need to get them in a hospital setting.
1. Needs to download image data from a Picture Archiving and Communication
Service (PACS).
1. DICOM (Digital Imaging and Communications in Medicine) started in the mid-_80's_.
:::
::: {.incremental}
- Where are clinical images stored in a hospital?
- A PACS.
- What is a PACS?
- Essentially a networked database.
- What does it speak?
- DICOM.
- Defines file format and transfer protocol.
- _Really_ comprehensive metadata.
:::
## Images
::: notes
1. Note the Patient "Genetic Modifications Sequence" and "Species" information.
:::
![](images/dicom_example.png)
## Images
::: notes
1. We need for a user to be able to query a PACS for patient info, scan details,
and finally download data.
2. The prototype works on NIfTI images. A format commonly used to store MRI
brain imaging data, popular with researchers, less so with clinicians.
3. The segmentation uses standard SPM12 functions to leave only grey matter.
:::
::: {.incremental}
- Use Pynetdicom^1^ to search for and download images.
- Use several tools to convert to NIfTI^2^ format.
- Use existing SPM12 container to segment into grey matter image.
- Note this segmentation is _slow_!
:::
::: footer
1: https://pydicom.github.io/pynetdicom/stable/
2: Neuroimaging Informatics Technology Initiative: https://nifti.nimh.nih.gov/
:::
## Building upon the prototype {.smaller}
::: {.notes}
1. Auth could later be SSO or whatever.
1. Search is within the PACS.
1. Relevant studies are MRI scans, perhaps with a particular description.
1. The download may be slow, or delayed.
1. The score retrieval may also be delayed.
1. The segmentation takes minutes at 100% CPU.
1. The result is stored locally because the user may have wandered off!
:::
The PIs want to take this from the prototype to something that can be used in a
clinical setting.
::: {.incremental}
1. The user, a clinician, will authenticate to the app.
1. They will then search for a patient, using patient ID, date of birth, and
name.
1. Once they find them, they choose from any relevant studies.
1. When they select the study they want the request begins:
1. The study is downloaded.
1. The cognitive scores are retrieved.
1. The study is converted to NIfTI, then segmented.
1. The ML code is run on this and the cognitive scores.
1. The result is stored locally.
:::
## The New App
::: {.notes}
1. XNAT no good because not widely deployed in clinical settings.
1. Want some client-server structure.
1. ADVANCE.
1. First deployed in Addenbrooke's, then other Cambridge Trust hospital, finally
some third party.
1. Turns out hospitals are putting all their infra in this fancy fly-by-night
thing called the cloud.
1. And they don't want random researchers installing things into their web infra.
:::
[![](images/AWEBAPP.gif)]{.fragment .fade-up}
::: {.incremental}
1. Everyone has a web browser, no need to worry about clients.
2. Established network protocol, security available.
3. [[Can install it into the existing hospital web infra.]{.fragment .strike}]{.fragment .semi-fade-out}
4. I've written web apps before.
:::
## Flask
::: notes
1. My favourite!
1. Flask doesn't require any particular database or structure.
1. Instead you add extensions for those things.
1. In the example explain this is the entire application. The decorator
registers the `hello_world` function to the `"/"` route.
1. When a request comes in on that route the function is run and its return
value sent back to the client.
1. Usually have Jinja templates (unless designing an API) which can do
Python-like logic.
:::
Flask^1^ is a lightweight and extendable web microframework.
::: {.fragment .fade-in}
A (very) minimal application is:
```{.python}
from flask import Flask, render_template
app = Flask(__name__)
@app.route("/")
def hello_world():
return "<p>Hello, World!</p>"
@app.route('/hello/')
@app.route('/hello/<name>')
def hello(name=None):
return render_template('hello.html', person=name)
```
:::
::: footer
1: https://flask.palletsprojects.com/
:::
## Flask App
So we wrote a fairly simple Flask app with the following routes:
```
Endpoint Methods Rule
------------------------ --------- -----------------------
auth.login GET, POST /login
main.home_page GET /
main.search_previous GET, POST /search_previous
main.select_patient GET, POST /select_patient
main.select_series GET /select_series
main.select_study GET /select_study
main.explain_no_config GET /explain_no_config
main.request_score GET /request_score
main.confirm_delete GET /confirm_delete
main.delete_request GET /delete_request
main.view_request GET /view_request
admin.add_user GET, POST /add_user
admin.admin_home_page GET /admin
admin.edit_configuration GET, POST /edit_configuration
admin.edit_user GET, POST /edit_user
admin.edit_users GET /edit_users
admin.view_logs GET /view_logs
static GET /static/<path:filename>
```
::: notes
1. There's a login page that sets a local cookie once authenticated. Handled by
a Flask extension, Flask-login. All other routes are marked as "for logged in
users only".
1. The `main` routes allow the clinician to search a PACS for patients, then for
MRI studies for that patient, then download them.
1. Also to submit a request for a score calculation.
1. Forms go to `POST` routes, simplified by an extension, Flask-WTForms.
1. Finally some admin routes that are only available to users marked as "admin",
for adding users and so on.
:::
## Where to run the flask app?
::: {.incremental}
1. We don't want to run it inside the Docker container with the ancient Matlab,
Debian, etc, because that's old.
1. Don't want to assume what the client will have installed.
1. I know....
:::
::: {.fragment .fade-in}
::: {.r-fit-text}
Another Docker container!
:::
:::
## App design
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Flask app container] --> B[ML container]
A -.-> C[(REDCap server)]
```
::: notes
1. The Flask app handles the users and interacts with the ML container and
the remote REDCap server.
:::
## Database
::: {.incremental}
- App will need users, authentication.
- So needs somewhere to store usernames, hashes.
- Also track the requests a clinician has made.
- _Could_ use SQLite but gives problems later.
- Really like to use PostgreSQL^1^.
- Should we assume hospital has it installed?
- No...
:::
::: {.fragment .fade-in}
![](images/a_container.gif)
:::
::: notes
1. While it is possible to hook into SSO, etc, we will start with a local
authentication mechanism.
1. The clinician makes a request, we will need to store information like patient
details somewhere.
1. SQLite builtin to Python, nice database but it will prove problematic in later steps.
:::
::: footer
1: https://www.postgresql.org/
:::
## App design 2
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Flask app container]-->B[ML container]
A-->D[PostgreSQL server]
B-->D
A-.->C[(REDCap server)]
```
## SQLAlchemy^1^
::: notes
1. SQLAlchemy is a Python SQL toolkit and Object Relational Manager.
1. Captures the description of each table in the DB in _Python_ code.
6. Only store the schema in one place.
2. Can automatically create the entire DB. Extensions to migrate between versions.
3. Provides Pythonic way to address database joins.
4. So in example shown, each Request references a single User by its `id` (line 15).
5. By declaring a `relationship` (line 16) we can reference the User object that
the request is owned by directly. We also use `back_populates` to get a list of
Requests that each User owns. All accessing the database is lazy.
:::
::: {.fragment .fade-in}
```python
class User(UserMixin, db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(length=64), unique=True, index=True,
nullable=False)
# as per https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690
email = db.Column(db.String(length=254))
password_hash = db.Column(db.String(length=128), nullable=False)
is_admin = db.Column(db.Boolean, nullable=False, default=False)
requests = relationship('Request')
class Request(db.Model):
id = db.Column(db.Integer, primary_key=True) # internal ID
user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
user = relationship('User', back_populates='requests')
series_instance_uid = db.Column(db.String(length=64),
nullable=False) # UI
patient_id = db.Column(db.Text, ForeignKey('patient.patient_id'),
nullable=False) # LO
patient = relationship('Patient')
study_description = db.Column(db.Text) # LO
study_date = db.Column(db.Date) # DA
series_description = db.Column(db.Text) # LO
series_date = db.Column(db.Date) # DA
series_description = db.Column(db.Text) # LO
mr_scan_path = db.Column(db.Text)
mr_scan_path_nifti = db.Column(db.Text)
gm_scan_path = db.Column(db.Text)
# will fix but currently scores parsed as
# comma sep list by scripts
score_data = db.Column(db.Text)
# result_data stores JSON encoded result data.
result_data = db.Column(db.Text, default='{}')
request_date = db.Column(db.DateTime, server_default=func.now())
result_date = db.Column(db.DateTime)
# Choose 40 chars because we might use git commit SHAs as version
# markers one day?
result_version = db.Column(db.String(40))
task_id = db.Column(db.String(36))
```
:::
::: footer
1: https://www.sqlalchemy.org/
:::
## SQLAlchemy
::: notes
1. There's also a Flask-SQLAlchemy extension.
:::
```python
>>> simon = User.query.get(1)
>>> simon.username
'simon'
>>> simon.password_hash
'pbkdf2:sha256:600000$P950Nsh7qgGMblJB$8b247bac3ff808890f3bf4e316fdfa77ab03d749c534f6227d6dc6676275b60c'
>>> simon.requests
[<Request 4>, <Request 5>, <Request 6>]
```
## The segmentation process
::: {.notes}
1. The segmentation process may be slow. Might take minutes, should probably
run serially with other segmentations.
1. But web requests are pretty much immediate (apart from streaming, etc).
1. So we need an asynchronous system: user submits request, goes away, comes
back
later.
1. Celery is a task queue sytem: put tasks in the queue, workers will process
them.
1. Celery needs a broker, effectively a fast database / cache, which manages the tasks.
1. Redis can do this.
1. Do we assume it's there...?
:::
::: {.incremental}
- Segmentation is quite slow.
- A web request is expected to finish quickly.
- Need asynchronous task management.
- Celery^1^!.
- But celery needs a broker.
- Redis^2^.
- _Where can we run Redis?_
:::
## You guessed it!
![](images/another_container.gif)
::: footer
1: https://docs.celeryq.dev/en/stable/
2: https://redis.io/
:::
## App design ~~2~~ 3
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Flask app container]-->B[ML container]
A-->D[PostgreSQL server]
B-->D
A-->E[Redis container]
B-->E
A-.->C[(REDCap server)]
```
## Network security paramount
::: notes
1. App accessed through specialised interface called WSGI.
1. WSGI servers are usually specialised for this purpose, not always for TLS.
1. The web infra that we may or may not be installing into may already have TLS (i.e. we're
a part of an existing website).
1. So this needs to be optional: a reverse proxy that handles TLS and isn't missed if it's not there.
1. Easily done in nginx.
:::
::: {.incremental}
- Web Services Gateway Interface.
- Host may already do this.
- Use nginx^1^.
- Where can it go?
- WHERE CAN IT GO, SIMON?
:::
::: footer
1: https://nginx.org/
:::
## Con Tainer?
![](images/yet_another_container.gif)
## App design ~~2~~ ~~3~~ 4
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Flask app container]-->B[ML container]
A-->D[PostgreSQL server]
B-->D
A-->E[Redis container]
B-->E
A-.->C[(REDCap server)]
A-.->Z[nginx]
```
## What about testing?
::: notes
1. Can't test against live PACS for legal as well as sanity reasons.
1. Some mocking is being used, but that can end up skipping or mocking the bits
we're trying to test.
1. Orthanc is a free and open source DICOM server for medical imaging from Belgium.
1. Safe data is MRI studies that are public domain or that have consented to this use.
:::
::: {.incremental}
- Can't test against live PACS, want to test PACS parts.
- Mocking? But this is the hard bit.
- Run a DICOM server (Orthanc^1^) with safe data.
- Oh ho ho no no.
:::
::: footer
1: https://www.orthanc-server.com/
:::
## App design ~~2~~ ~~3~~ ~~4~~ a millionty
```{mermaid}
%%| fig-width: 1000.0px
%%| fig-height: 400.0px
graph LR
A[Flask app container]-->B[ML container]
A-->D[PostgreSQL server]
B-->D
A-->E[Redis container]
B-->E
A-.->C[(REDCap server)]
A-.->Z[nginx]
A-.->X[Orthanc]
```
## Docker compose
::: {.incremental}
- Docker compose manages multiple containers (_services_).
- Easier to define shared networks, storage.
:::
::: {.fragment .fade-in}
```Dockerfile
# Helpful name prefix for all containers.
name: ai_dementia
services:
# Test PACS, use for development / testing only.
orthanc:
# build: ./orthanc_container
image: registry.gitlab.developers.cam.ac.uk/rcs/rse/ai-4-medical-images/ai_dementia_web/containers_orthanc:latest
restart: unless-stopped
# ports: ["104:4242", "8080:8042"]
ports: ["4242:4242"]
#ports: ["8080:8042"]
# uncomment this when adding entries
#volumes: ["/tmp/orthanc-storage:/var/lib/orthanc"]
environment:
ORTHANC__NAME: "The Orthanc"
VERBOSE_ENABLED: "true"
ORTHANC__REGISTERED_USERS: |
{"demo": "demo"}
ORTHANC__DICOM_MODALITIES: |
{
"aid": ["AIDSCU", "webapp", "4242"],
"aidmov": ["STORESCP", "webapp", "11113"]
}
networks:
backend:
aliases:
- orthanc-host
# The broker and backend for Celery.
redis:
image: redis
ports: ["6379:6379"]
restart: unless-stopped
networks:
backend:
aliases:
- redis-host
# The app that serves the pages.
webapp:
# build: ./webapp_container
image: registry.gitlab.developers.cam.ac.uk/rcs/rse/ai-4-medical-images/ai_dementia_web/containers_webapp:latest
pull_policy: never
restart: unless-stopped
ports: ["127.0.0.1:8000:8000"]
volumes:
- webapp-storage:/shared-volume
- config-storage:/config:ro
depends_on:
- db
- redis
networks:
backend:
aliases:
- webapp-host
# Container for the ML calculation parts, MATLAB, etc.
aid_ml:
# build: ./aid_ml_container
image: registry.gitlab.developers.cam.ac.uk/rcs/rse/ai-4-medical-images/ai_dementia_web/containers_aid_ml:latest
pull_policy: never
restart: unless-stopped
volumes:
- webapp-storage:/shared-volume
- config-storage:/config:ro
depends_on:
- db
- redis
networks:
backend:
aliases:
- aid_ml-host
# Postgres to store intermediate and final results.
db:
image: postgres
restart: unless-stopped
environment:
POSTGRES_PASSWORD_FILE: /config/app_config/db_password.txt
POSTGRES_DB: ai_dementia
# Uncomment this to access DB from outside containers
# DO NOT DO THIS IN PRODUCTION
ports: ["127.0.0.1:5432:5432"]
volumes:
- db-storage:/var/lib/postgresql/data
- config-storage:/config:ro
networks:
backend:
aliases:
- db-host
# Reverse proxy for the webapp. _May_ not be needed, if client
# infrastructure can already proxy. This reverse proxy must be able to
# handle TLS / SSL because the webapp doesn't.
proxy_webserver:
image: registry.gitlab.developers.cam.ac.uk/rcs/rse/ai-4-medical-images/ai_dementia_web/containers_proxy:latest
build: ./proxy_server_container
pull_policy: never
ports: ["443:443"]
volumes:
- config-storage:/config:ro
networks:
backend:
aliases:
- proxy-host
depends_on:
- webapp
environment:
- NGINX_PORT=443
# Deploy container
# TBD
volumes:
# Storage for intermediate files / images.
webapp-storage:
# Storage for the database.
db-storage:
# Storage for configuration and startup / update / etc.
config-storage:
networks:
# Network for containers to talk to each other.
backend:
```
:::
## App design ~~2~~ ~~3~~ ~~4~~ ~~a millionty~~ FINALEST
```{mermaid}
graph LR
subgraph Web App container
A[web app]
end
subgraph PACS facing Container
E[PACS download worker]
F[Image pre-process workers]
end
subgraph Redis Container
A-->|submits tasks|B[Redis]
B-->|runs tasks N=1|E
B-->|runs tasks|F
end
subgraph AD Container
B-->|runs tasks|C[REDCap query workers]
B-->|runs tasks|D[Compute score workers]
end
subgraph DB container
G[database]
A-->G
C-->G
E-->G
F-->G
D-->G
end
```