-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpaper.qmd
1024 lines (910 loc) · 56.9 KB
/
paper.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Road User video evidence of road traffic offences: Preliminary analysis of Operation Snap data and suggestions for a research agenda"
bibliography: [references.bib, opsnap.bib]
editor:
markdown:
wrap: sentence
# keywords:
# - Road safety
# - Video evidence
# - Near misses
# - Operation Snap
# - Dangerous driving
# - Antisocial driving
date: last-modified
format:
# html: default
# docx: default
# pdf: default
arxiv-pdf:
keep-tex: true
# linenumbers: true # Add (continuous) line numbers?
# doublespacing: false # Double space the PDF output?
# runninghead: "Preprint" # The text on the top of each page of the output
# authorcols: true # Should authors be listed in a single column (default) or in multiple columns (`authorcols: true`)
execute:
echo: false
warning: false
cache: false
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE
)
```
```{r}
#| eval: false
#| echo: false
# quarto::quarto_render("paper.qmd", output_format = "pdf")
# file.copy("paper.pdf", "~/OneDrive/opsnap_leeds/paper-no-line-numbers-v12.pdf")
# quarto::quarto_render("paper.qmd", output_format = "arxiv-pdf")
# file.copy("paper.pdf", "paper-v12.pdf")
# file.copy("paper.pdf", "~/OneDrive/opsnap_leeds/paper-v12.pdf")
# quarto::quarto_render("paper.qmd", output_format = "docx")
# # file.copy("paper.docx", "~/OneDrive/opsnap_leeds/paper-v12.docx")
# file.copy("paper.docx", "paper-v12.docx")
# quarto::quarto_render("title.qmd", output_format = "arxiv-pdf")
# browseURL("title.pdf")
# system("gh release upload v1 --clobber paper-v12.docx")
# browseURL("paper.docx")
# # Install extension:
# system("quarto install extension mikemahoney218/quarto-arxiv")
# system("sudo apt install lmodern")
```
```{r setup}
#| include: false
devtools::load_all()
```
```{r}
#| include: false
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(tidyverse)
library(gt)
library(gtExtras)
remotes::install_github("ITSLeeds/opsnap")
```
```{r}
#| include: false
#| echo: false
devtools::load_all()
```
```{r}
#| echo: false
#| eval: false
library(rvest)
query = ".file-link a"
url = "https://www.westyorkshire.police.uk/SaferRoadsSubmissions"
files_available = read_html(url) |>
html_nodes(query) |>
html_text()
links_available = read_html(url) |>
html_nodes(query) |>
html_attr("href")
urls = paste0("https://www.westyorkshire.police.uk", links_available)
file_names = basename(links_available)
tibble::tibble(file_names) |>
knitr::kable()
```
```{r}
#| include: false
# In preparation for reading-in all data:
# date_str = format(Sys.Date(), "%Y-%m")
date_str = format(as.Date("2024-04-01"), "%Y-%m")
file_name = paste0("data/west-yorkshire/operation_snap_", date_str, ".csv")
```
```{r}
#| eval: false
#| echo: false
# Download them to raw_data/west-yorkshire:
dir.create("raw_data/west-yorkshire", recursive = TRUE, showWarnings = FALSE)
pbapply::pblapply(urls, function(u) {
download.file(u, paste0("raw_data/west-yorkshire/", basename(u)))
})
# Test with 2nd url:
u = urls[2]
d2 = opsnap:::download_and_read(u)
d_all = purrr::map_df(urls, opsnap:::download_and_read)
dir.create("data/west-yorkshire", recursive = TRUE, showWarnings = FALSE)
# Date to nearest month:
write_csv(d_all, file_name)
```
<!-- The data is open acess and looks like this, with names cleaned up by the package: -->
```{r}
#| eval: false
#| echo: true
#| include: false
u = "https://www.westyorkshire.police.uk/sites/default/files/2024-01/operation_snap_oct-dec_2023_0.xlsx"
d = opsnap:::download_and_read(u)
# Old names:
# [1] "REPORTER TRANSPORT MODE" "OFFENDER VEHICLE MAKE"
# [3] "OFFENDER VEHICLE MODEL" "OFFENDER VEHICLE COLOUR"
# [5] "OFFENCE" "DISTRICT"
# [7] "DISPOSAL" "DATE OF SUBMISSION"
# [9] "...9" "OFF LOCATION"
# New names:
# [1] "mode" "make" "model" "colour" "offence" "district" "disposal"
# [8] "date" "location"
```
<!-- The data looks like this (first 3 rows shown): -->
```{r}
#| echo: false
#| include: false
file_exists = file.exists(file_name)
if (!file_exists) {
# stop("try downloading the data first!")
u = "https://github.com/ITSLeeds/opsnap/releases/download/v1/operation_snap_2024-04.csv"
f = basename(u)
download.file(url = u, destfile = f)
dir.create("data/west-yorkshire", recursive = TRUE)
file.copy(f, "data/west-yorkshire/")
}
d_all = read_csv(file_name)
d_all = d_all |>
mutate(
mode = tolower(mode),
offence = tolower(offence),
disposal = tolower(disposal),
# First word of text in offence column:
offence_code = str_extract(offence, "\\w+")
)
d_all |>
head(3) |>
# Convert column names to title case:
rename_all(snakecase::to_title_case) |>
# snakecase::to_title_case
knitr::kable()
```
```{r}
d_all_monthly = d_all |>
mutate(month = lubridate::floor_date(date, "month")) |>
group_by(month) |>
summarise(n = n()) |>
mutate(records = "all")
d_offence = d_all |>
opsnap:::filter_offence_nas()
# table(d_offence$offence) |> sort() |> tail()
# rv86019 use a handheld phone / device whilst driving a motor vehicle on a road
# 357
# rt88966 motor vehicle fail to comply with endorsable s36 traffic sign
# 411
# rt88971 fail to comply with red traffic light
# 679
# rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals
# 1364
# rt88575 drive without due care and attention
# 2917
# rt88576 drive without reasonable consideration to others
# 4992
d_offence_monthly = d_offence |>
mutate(month = lubridate::floor_date(date, "month")) |>
group_by(month) |>
summarise(n = n()) |>
mutate(records = "with_offence")
d_with_location = d_all |>
opsnap:::filter_location_nas()
d_complete = d_offence |>
opsnap:::filter_location_nas()
d_complete_monthly = d_complete |>
mutate(month = lubridate::floor_date(date, "month")) |>
group_by(month) |>
summarise(n = n()) |>
mutate(records = "complete")
d_monthly = bind_rows(d_all_monthly, d_complete_monthly)
# d_monthly = bind_rows(d_all_monthly, d_offence_monthly, d_complete_monthly)
```
```{r}
#| label: stats19
#| include: false
# years = 2021:2022
# dir.create("data/stats19", recursive = TRUE, showWarnings = FALSE)
# collisions_2022 = stats19::get_stats19(year = 2022)
# collisions_2021 = stats19::get_stats19(year = 2021)
# collisions_2020 = stats19::get_stats19(year = 2020)
# collisions = bind_rows(collisions_2020, collisions_2021, collisions_2022)
# write_csv(collisions, "collisions_2020-2022.csv")
# piggyback::pb_upload("collisions_2020-2022.csv")
system("gh release download v1")
collisions = read_csv("collisions_2020-2022.csv")
stats19_monthly = collisions |>
filter(police_force == "West Yorkshire") |>
mutate(month = lubridate::floor_date(date, "month")) |>
group_by(month) |>
summarise(n = n()) |>
mutate(records = "stats19")
table(d_all$mode)
table(d_all$mode)
# cyclist horse rider motorcyclist pedestrian
# 7069 456 50 1467
# unknown vehicle driver vehicle passenger
# 526 10145 650
perc_vehicle_driver = d_all |>
count(mode) |>
filter(mode == "vehicle driver") |>
pull(n) / nrow(d_all) * 100
perc_cyclist = d_all |>
count(mode) |>
filter(mode == "cyclist") |>
pull(n) / nrow(d_all) * 100
perc_pedestrian = d_all |>
count(mode) |>
filter(mode == "pedestrian") |>
pull(n) / nrow(d_all) * 100
perc_horse = d_all |>
count(mode) |>
filter(mode == "horse rider") |>
pull(n) / nrow(d_all) * 100
perc_motorcyclist = d_all |>
count(mode) |>
filter(mode == "motorcyclist") |>
pull(n) / nrow(d_all) * 100
perc_other = 100 - perc_vehicle_driver - perc_cyclist - perc_pedestrian - perc_horse - perc_motorcyclist
# table(d_all$disposal)
# conditional offer court dsit investigation educational course
# 2563 321 202 10887
# fine nfa rpu investigation
# 1 6365 24
perc_disposal_educational_course_conditional = d_all |>
mutate(disposal = case_when(
disposal == "conditional offer" ~ "educational course",
TRUE ~ disposal
)) |>
count(disposal) |>
filter(disposal == "educational course") |>
pull(n) / nrow(d_all) * 100
# Just educational:
perc_disposal_educational = d_all |>
count(disposal) |>
filter(disposal == "educational course") |>
pull(n) / nrow(d_all) * 100
perc_disposal_conditional = d_all |>
count(disposal) |>
filter(disposal == "conditional offer") |>
pull(n) / nrow(d_all) * 100
perc_disposal_court = d_all |>
count(disposal) |>
filter(disposal == "court") |>
pull(n) / nrow(d_all) * 100
perc_undergoing_further_investigation = d_all |>
count(disposal) |>
filter(disposal == "undergoing further investigation") |>
pull(n) / nrow(d_all) * 100
perc_no_further_action = d_all |>
count(disposal) |>
filter(disposal == "no further action") |>
pull(n) / nrow(d_all) * 100
perc_course = d_all |>
count(disposal) |>
filter(disposal == "conditional offer") |>
pull(n) / nrow(d_all) * 100
date_range = range(d_all$date)
# Crashes in 2024:
d_all_2024 = d_all |>
filter(date >= "2024-01-01")
table(d_all_2024$date)
```
# Abstract {.unnumbered}
This study uses data from Operation Snap (OpSnap), the UK police's national system to receive road users' video evidence of road traffic offences.
Data from one police force area for 39 months (January 2021 to March 2024) <!-- (N= 18,363 records) --> (N = `r nrow(d_all) |> scales::number(big.mark = ",")` records) is analysed.
<!-- Of submitted cases, 49.9% were from vehicle drivers, 34.4% were from cyclists, and 7.4% were from pedestrians. --> Half were submitted by vehicle drivers (`r round(perc_vehicle_driver, 1)`%), a third by cyclists (`r round(perc_cyclist, 1)`%), `r round(perc_pedestrian, 1)`% by pedestrians, `r round(perc_horse, 1)`% by horse riders, `r round(perc_motorcyclist, 1)`% by motorcyclists, and `r round(perc_other, 1)`% were unknown.
We estimate that, relative to road distance travelled, cyclists were 20 times more likely to submit video evidence than vehicle drivers.
The most common offences overall were driving 'without reasonable consideration to others' or 'without due care and attention'.
<!-- Two thirds (66.1%) of reported cases resulted in the recommended disposal of an educational course (including conditional offers), 31% no further action, and fewer than 1% for court appearance. --> Half (`r round(perc_disposal_educational, 1)`%) of reported cases resulted in the recommended disposal of an educational course, `r round(perc_no_further_action, 1)`% no further action `r round(perc_disposal_conditional, 1)`% conditional offer, and `r round(perc_disposal_court, 1)`% resulted in court appearance.
A research agenda using OpSnap data is outlined that could emerge if national datasets are compiled and responsibly opened-up and made available for research and policy-making: data-driven research should identify hotspot locations and other correlates of dangerous and antisocial road use at regional, and local levels; research projects should investigate disposal-related decision-making, video quality, and the role of supporting evidence; offence concentration (recidivism, repeat submitters of evidence, spatial hotspots) and case progression including court cases should be explored with reference to new video evidence.
We conclude that datasets derived from publicly-uploaded video submission portals have the potential to transform evidence-based policy and practice locally, nationally and internationally.
# Introduction
Dangerous and criminal driving are significant problems that take many forms [@simon1996; @corbett2003; @corbett2010].
There were 1,711 fatalities and 135,480 causalities across all severity categories (slight, serious fatal) in Britain resulting from car crashes and other road traffic collisions reported by the police in 2022 [@departmentfortransportReportedRoadCasualties2023].
This is just the tip of the iceberg of road traffic incidents, with many more near misses and minor incidents going unreported.
Traditionally, road safety research has relied on retrospective analysis of crashes to assess and enhance road safety [@towards2008].
However, relying on historical crash data poses several challenges.
Crashes are rare events [@elvik2006], requiring extended periods, often at least five years, of data to obtain statistically significant estimates [@songchitruksa2006].
Furthermore, the reactive nature of crash analysis means that safety improvements only follow after crashes occur, which is both inefficient and ethically problematic.
Surrogate safety measures offer an alternative to casualty data by using more frequently observable, less severe, traffic events to identify road safety issues [@songchitruksa2006].
One prominent surrogate measure is traffic conflicts [@lord2021].
A traffic conflict occurs when road users’ paths intersect with a collision risk if no action was taken [@tarko2018].
The use of traffic conflicts is illustrated by Hyden’s safety pyramid [@hyden1987], which represents the spectrum of traffic events from safe, undisturbed passages at the base, to severe, rare crashes at the top.
Hyden’s model demonstrates the inverse relationship between crash frequency and severity.
Understanding this relationship allows for the prediction of severe crashes based on the more common, less severe conflicts.
Traditional methods for observing traffic conflicts have relied on manual observation, which is resource intensive.
The present paper highlights the potential of using open access data from Operation Snap, the UK police’s national system to receive road users’ video evidence of road traffic offences as a surrogate measure of road safety.
<!-- Most injuries involve motor vehicle occupants, however when accounting for the distance travelled by road user groups, cyclists are over-represented in injury statistics [@departmentfortransportReportedRoadCasualties2023].
Between 2004 and 2022, an average of 104 cyclists were killed and 4,212 were seriously injured each year in Britain according to police records [@departmentfortransport].
Almost half of cyclist fatalities involved collision with a car, with 56% on rural roads (compared to 30% of traffic).
The 2023 report noted that "the most common contributory factor allocated to pedal cyclists in fatal or serious collisions (FSC) with another vehicle was 'driver or rider failed to look properly'" [@departmentfortransport]. -->
<!-- The context for the study is that, between 2004 and 2022, an average of 104 cyclists were killed and 4,212 were seriously injured each year according to official records recorded by police forces across Great Britain [@departmentfortransport].
Almost half of cyclist fatalities involved collision with a car, with 56% on rural roads (compared to 30% of traffic).
The 2023 report noted that “the most common contributory factor allocated to pedal cyclists in fatal or serious collisions (FSC) with another vehicle was ‘driver or rider failed to look properly’” [@departmentfortransport]. -->
In what follows, it emerges that video submissions are, relative to road distance travelled, 20 times disproportionately reported by cyclists.
Cyclist crashes are underreported in police recorded datasets [@elvikIncompleteAccidentReporting1999] and the extent of injuries sustained by cyclists may be higher when hospital-recorded cases are counted [@janstrupetal.UnderstandingTrafficCrash2016].
Under-reporting in police datasets is likely to be greatest for minor injuries, while records are seldom kept when a collision or injury is avoided due to riders or drivers taking evasive actions [@ibrahimCyclingMissesReview2021].
These incidents are often referred to as "near-misses" [@ibrahimCyclingMissesReview2021]
Commuter cyclists in the UK experience a near miss for every six miles of riding [@aldredInvestigatingRatesImpacts2015a], and concern over near misses is a key reason people choose not to cycle [@sandersPerceivedTrafficRisk2015a].
Near misses are associated with inattentive driving, aggressive driving, driving too fast, passing too close, being car-doored, and being cut off by turning drivers [@sandersPerceivedTrafficRisk2015a; @cubbin2024].
A review of near-miss cycling crashes highlighted the need for better data to inform safety research [@ibrahimCyclingMissesReview2021].
Close passes are the most common type of near miss reported by cyclists and are associated with collisions resulting in injury [@aldred2016].
A 'close pass' refers to when a vehicle passes too close to a cyclist, which is defined in the UK as less than 1.5 metres away at 30mph (50kph).
<!-- Close passes take different forms including the 'punishment pass' by angry drivers for a perceived slight such as causing the driver to slow down [@cubbin2024]. --> There is no specific law in the UK Road Traffic Act 1988 for driving too close to a cyclist, but two are commonly applied for careless driving: RT88575, driving without due care and attention; and RT99576, driving without reasonable consideration to others - and these are prominent in the analysis that follows.
Operation Snap, often referred to informally as 'OpSnap', was piloted by North Wales police in October 2016 and adopted by all Welsh forces by 2018.
It is now in operation nationally across England and Wales, each police force offering its own submission portal for road users to submit video evidence.
The nature of video submissions and the related expectations were summarised on the website of one Police and Crime Commissioner as follows:
- The secure form is for traffic offences, it is NOT for submitting footage of road traffic collisions, any other offences or for parking issues.
- The car registration number of the offending vehicle must be clearly visible.
- The public should be prepared to sign a witness statement and possibly give evidence in court.
- Statements for OpSnap can only be accepted from persons aged 18 or over. If you are under 18 the incident should be reported by email.
This is, to our knowledge, the first research study to use this dataset.
As such, the study is offered as proof-of-concept of the potential for its further analysis to improve road safety.
Following analysis and discussion of three years of data for one police force area, the study outlines a research agenda designed to inform policy and practice.
# Methods and data {#sec-methods}
Open access data from West Yorkshire Safer Roads from the OpSnap project was used for this study.
The media submissions portal opened in July 2020, and available data from the West Yorkshire Police (WYP) used in this paper span the calendar years 2021, 2022 and 2023.
For the year 2021 there was less than half the cases of either 2022 or 2023, which could reflect reduced road use during the COVID-19 pandemic and fewer people being aware of OpSnap when it commenced.
The dataset is a tabulation of cases submitted to the West Yorkshire Safer Roads OpSnap web portal.
The terms 'record' and 'case' are used interchangeably here to refer to a record in the OpSnap database, each of which represents the separate submission of video evidence by a road user.
The portal allows members of the public to submit video footage of suspected traffic offences committed by motor vehicle drivers.
Video footage is commonly recorded from on-board cameras.
For motor vehicles these cameras are typically mounted on or near the vehicle front dashboard, known as 'dash-cams'.
One source examined GB Driving Licence data to find that, by early 2024, close to a third of private and commercial vehicle in the UK had a dash-cam [@DashCamSubmissions2024].
For cyclists, footage is commonly recorded using helmet or handlebar-mounted cameras.
The proportion of cyclists using these cameras is unknown but anecdotal evidence suggests increased usage, with many choosing to record rides in case an incident occurs.
The proportions of horse riders, pedestrians and motor cyclists recording video footage is, to our knowledge, also unknown.
Complainants upload footage and complete a short form that includes their personal details, the details of the vehicle involved including registration, make, model and colour, the location and time of the incident, and details of the camera used to record the footage.
Only vehicle offences can be reported as the registration number of any offending vehicle is required and must be legible in footage.
The open access data is a deidentified summary of submitted cases with information on mode of transport of the person reporting, offender vehicle details (make, model, colour), offence code, recommended disposal, date of submission, district, and offence location.
The offence location is typically a street name and town or city name, or an intersection, and examples included: 'A58 Godley Road, Halifax', 'Keighley Road, Silsden', 'Woodhouse Lane A660, Leeds'.
For this study, approximate geolocations were obtained using Google API, restricting cases to within West Yorkshire.
Further aspects of the data, their uses and limitations are discussed in what follows.
A random sample of 5 records from the raw data is shown in @tbl-raw (note: "nfa" refers to "no further action").
```{r}
#| label: tbl-raw
#| tbl-cap: "Random sample of 5 records from the raw data."
set.seed(24)
d_all |>
sample_n(5) |>
select(-district, -`offence_code`) |>
mutate(date = as.Date(date)) |>
arrange(date) |>
rename_all(snakecase::to_title_case) |>
knitr::kable()
# # For PDF:
# # |>
# # Striped styling and tiny text to fit:
# kableExtra::kable_styling(latex_options = "striped") |>
# # Set column widths:
# kableExtra::column_spec(1, width = "3em") |>
# kableExtra::column_spec(2, width = "3em") |>
# kableExtra::column_spec(3, width = "3em") |>
# kableExtra::column_spec(4, width = "3em") |>
# kableExtra::column_spec(5, width = "9em") |>
# kableExtra::column_spec(6, width = "4em") |>
# kableExtra::column_spec(8, width = "8em")
```
# Results
There were `r nrow(d_all) |> scales::number(big.mark = ",")` records in the dataset for the three-year study period, with a strong upward trend, as shown in the monthly counts presented in @fig-time.
Since early 2022, there have been more monthly records in the OpSnap data than in the official 'STATS19' road traffic collision records for West Yorkshire, highlighting the under-reporting of road traffic incidents in official statistics.
STATS19 records are from the Department for Transport's database of road traffic collisions reported to the police and only include incidents that result in injury.
Like OpSnap data, STATS19 records are open access.
For the results presented in @fig-time, STATS19 datasets were downloaded with the `stats19` R package [@lovelace2019] and filtered to include only records from West Yorkshire.
```{r}
#| label: fig-time
#| fig-cap: "Monthly count of Operation Snap (complete and with offence and location data, red and green) and official STATS19 road traffic collision records (blue), West Yorkshire."
d_monthly = bind_rows(d_monthly, stats19_monthly)
d_monthly |>
mutate(records = case_when(
records == "all" ~ "OpSnap (all)",
records == "complete" ~ "OpSnap (complete)",
records == "stats19" ~ "STATS19"
)
) |>
rename_all(snakecase::to_title_case) |>
ggplot() +
geom_line(aes(Month, N, colour = Records), alpha = 0.5, size = 2) +
# geom_smooth(aes(month, n, colour = records), method = "lm", se = FALSE) +
labs(
# title = "Number of monthly records in West Yorkshire Police\nOperation Snap data",
x = "Date",
y = "Number of records per month") +
theme_minimal()
```
Some records lacked either an offence (`r format(nrow(d_all) - nrow(d_offence), big.mark = ",", scientific = FALSE)`, `r round((nrow(d_all) - nrow(d_offence)) / nrow(d_all) * 100, 1)`%) or a location (`r format(nrow(d_offence) - nrow(d_complete), big.mark = ",", scientific = FALSE)`, `r round((nrow(d_offence) - nrow(d_complete)) / nrow(d_all) * 100, 1)`%), or both, leaving `r round(nrow(d_complete) / nrow(d_all) * 100, 1)`% or `r format(nrow(d_complete), big.mark = ",", scientific = FALSE)` complete records.
There was a distinct seasonal pattern to reporting, with significant increases in summer months.
A summary of the `r nrow(d_offence) |> scales::number(big.mark = ",")` records with an offence is presented in @tbl-offences.
Included in the table are the number and percentage of records by offence type, showing the top 6 offence types and the remainder grouped as ‘Other’.
The most common offences were ‘Driving without reasonable consideration to others (rt88576)’ and ‘Driving without due care and attention (rt88575)’.
Within the Road Traffic Act these offences are related to careless driving and drivers are subject to similar penalties.
The other common offences included failing to comply with traffic signals, traffic signs and using a handheld phone while driving.
<!-- TODO: Within the other category... -->
```{r}
#| label: tbl-offences
#| include: true
#| tbl-cap: "Offence types reported (top 6 and other)."
#| width: 60%
# Aim: get table of n. offences by mode
d_mode_offence_count = d_all |>
count(mode, offence, sort = TRUE)
# offences in order of n. offences
d_offence_count = d_all |>
count(offence, sort = TRUE)
# d_offence_count |>
# arrange(desc(n)) |>
# head(20) |>
# knitr::kable()
# |offence | n|
# |:------------------------------------------------------------------------------------------------------|----:|
# |n/a | 5706|
# |rt88576 drive without reasonable consideration to others | 4992|
# |rt88575 drive without due care and attention | 2917|
# |rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals | 1364|
# |rt88971 fail to comply with red traffic light | 679|
# |rt88966 motor vehicle fail to comply with endorsable s36 traffic sign | 411|
# |rv86019 use a handheld phone / device whilst driving a motor vehicle on a road | 357|
# |rt88760 fail to comply with solid white lines | 265|
# |rt88751 contravene give way sign | 264|
# |suspected contravene weight restriction. | 213|
# |rt88751 contravene mandatory direction arrows | 212|
# |me82009 driving on hard shoulder of motorway | 113|
# |rt88975 fail to comply with red traffic light | 109|
# |rt88751 motor vehicle fail to comply with a non-endorsable traffic sign other (specify) | 91|
# |zp97004 fail to comply with red light pelican crossing | 84|
# |zp97003 stop within controlled area of pelican crossing | 81|
# |rc86814 driver not in proper control of vehicle | 62|
# |zp97001 stop vehicle within limits of pelican crossing | 59|
# |hy35001 drive/ride on footpath beside a road | 42|
# |rr84171 vehicle contravene local traffic order other than parking (e.g. bus lane) | 41|
# Pull out the top 6 offences excluding n/a:
d_offence_top_6 = d_mode_offence_count |>
# filter(offence != "n/a") |>
group_by(offence) |>
summarise(n = sum(n)) |>
arrange(desc(n)) |>
head(6)
d_offence_classified = d_offence_count |>
mutate(
Offence = case_when(
offence %in% d_offence_top_6$offence ~ offence,
TRUE ~ "Other"
)
) |>
group_by(Offence) |>
summarise(`Number of records` = sum(n)) |>
# Arrange in descending order of n except for "Other":
arrange(Offence == "Other", desc(`Number of records`)) |>
# Rename "n/a" to "No offence":
mutate(Offence = case_when(
Offence == "n/a" ~ "NA No offence or unknown offence type",
TRUE ~ Offence
)) |>
mutate(`Percent of records` = round(`Number of records` / sum(`Number of records`) * 100, 1))
d_offence_totals = d_offence_classified |>
summarise(`Number of records` = sum(`Number of records`), `Percent of records` = sum(`Percent of records`)) |>
mutate(Offence = "Total")
# d_offence_classified |>
# knitr::kable()
tbl = d_offence_classified |>
bind_rows(d_offence_totals) |>
mutate(
`Percent of records`= round(`Percent of records`)
) |>
rename(`Number` = `Number of records`, `Percent`=`Percent of records`) |>
gt() |>
gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |>
text_transform(
fn <- function(x){
code <- str_extract(x, "^[^\ ]+")
desc <- str_remove(x, "^[^\ ]+")
glue::glue("<em><span style='font-size:14px'>{code}</span></em><br><span style='font-size:18px'>{desc}</span>")
},
locations=cells_body(columns=Offence)
) |>
cols_width(1 ~ px(600), 2 ~ px(80)) |>
gt_theme_espn()
# gt::gtsave(tbl, "tbl-offences.html")
# webshot2::webshot("tbl-offences.html", "tbl-offences.png")
# browseURL("tbl-offences.png")
knitr::include_graphics("tbl-offences.png")
```
```{r}
# reclassify offences into types
x = d_all
regroup_offences_simple = function(x) {
x |>
dplyr::mutate(
offence_simple = dplyr::case_when(
stringr::str_detect(offence, "drive without reasonable consideration") ~ "Inconsiderate driving",
stringr::str_detect(offence, "drive without due care") ~ "Careless driving",
TRUE ~ "Other"
)
)
}
# table(x$mode)
regroup_modes = function(x) {
x |>
dplyr::mutate(
mode_simplified = dplyr::case_when(
mode == "cyclist" ~ "Cyclist",
mode == "vehicle driver" ~ "Driver",
TRUE ~ "Other"
))
}
d_all = regroup_offences_simple(d_all)
d_all = regroup_modes(d_all)
```
Vehicle (mostly car and van drivers with dashcams) driver and cyclist reporters dominate reporting for all records, as illustrated in @tbl-mode.
Half of the cases were reported by vehicle drivers, a third by cyclists, seven percent by pedestrians, with over two percent by horse riders and less than one percent by motorcyclists.
```{r}
#| label: tbl-mode
#| tbl-cap: Mode of transport of person submitting video evidence
#| width: "80%"
# d_all |>
# count(mode, sort = TRUE) |>
# mutate(percent_of_records = n / nrow(d_all)) |>
# mutate(percent_of_records = round(percent_of_records, 3) * 100) |>
# arrange(desc(n)) |>
# rename_all(snakecase::to_title_case) |>
# # Rename N to "Number of records"
# rename(`Number of Records` = N) |>
# knitr::kable()
tbl = d_all |>
# NA is Unknown:
mutate(mode = case_when(
is.na(mode) ~ "unknown",
TRUE ~ mode
)) |>
count(mode, sort = TRUE) |>
mutate(percent_of_records = n / nrow(d_all)) |>
mutate(percent_of_records = round(percent_of_records, 3) * 100) |>
# Reclassify Unknown to unknown:
arrange(desc(n))
tbl_totals = tbl |>
summarise(n = sum(n), percent_of_records = sum(percent_of_records)) |>
mutate(percent_of_records = round(percent_of_records)) |>
mutate(mode = "Total")
tbl = bind_rows(tbl, tbl_totals) |>
rename_all(snakecase::to_title_case) |>
# Rename N to "Number of records"
rename(`Number` = N, `Percent` = `Percent of Records`) |>
gt() |>
gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |>
cols_width(1 ~ px(400), 2 ~ px(80)) |>
tab_style(
style = cell_text(size=px(18)),
locations = cells_body(columns = Mode)
) |>
gt_theme_espn()
# tbl
# gt::gtsave(tbl, "tbl-mode.html")
# webshot2::webshot("tbl-mode.html", "tbl-mode.png")
knitr::include_graphics("tbl-mode.png")
```
<!-- The equivalent table excluding records with missing offence data is shown below: -->
```{r}
#| include: false
d_all |>
opsnap:::filter_offence_nas() |>
filter(offence != "n/a") |>
count(offence, sort = TRUE) |>
mutate(percent_of_records = n / nrow(d_offence)) |>
mutate(percent_of_records = scales::percent(round(percent_of_records, 3))) |>
arrange(desc(n)) |>
head(10) |>
rename_all(snakecase::to_title_case) |>
knitr::kable()
```
<!-- For cases submitted by people riding cycles (shown in @tbl-offences-cyclist-observer), the most common offences were also both associated with careless driving, particularly driving without reasonable consideration to others (78.7%), there were also a small proportion of cases associated with drivers using mobile phones (3.7%), failing to comply with traffic signals (3.5%) and contravening regulator signage (0.6%). -->
```{r}
# # Offence (grouped) | | Total |
# |-------------------------|---------------|-------|
# | | Dangerous driving | Other offences |
# |-------------------------|-------------------|-----------------|
# | **Reporter transport mode** | | |
# | Vehicle driver | **Count** | **5511** | **654** | **6165** |
# | | % within Reporter transport mode | **89.4%** | **10.6%** | **100.0%** |
# | Cyclist | **Count** | **4469** | **249** | **4718** |
# | | % within Reporter transport mode | **94.7%** | **5.3%** | **100.0%** |
# | Other | **Count** | **1279** | **220** | **1499** |
# | | % within Reporter transport mode | **85.3%** | **14.7%** | **100.0%** |
# | **Total** | **Count** | **11259** | **1123** | **12382** |
# | | % within Reporter transport mode | **90.9%** | **9.1%** | **100.0%** |
# R version:
pivot_counts = d_all |>
filter(offence != "n/a") |>
# mutate(
# offence_simple = case_when(
# offence %in% d_offence_top_6$offence ~ offence,
# TRUE ~ "Other"
# )
# ) |>
group_by(mode_simplified, offence_simple) |>
summarise(n = n()) |>
pivot_wider(names_from = mode_simplified, values_from = n) |>
mutate(
Total = Cyclist + Driver + Other
) |>
arrange(offence_simple == "Other", desc(Total))
pivot_counts_total = tibble::tibble(
offence_simple = "Total",
Cyclist = sum(pivot_counts$Cyclist),
Driver = sum(pivot_counts$Driver),
Other = sum(pivot_counts$Other),
Total = sum(pivot_counts$Total)
)
pivot_percents = pivot_counts |>
# Calculate percentage of each mode
mutate(
`Cyclist (%)` = Cyclist / sum(Cyclist) * 100,
`Driver (%)` = Driver / sum(Driver) * 100,
`Other (%)` = Other / sum(Other) * 100,
`Total (%)` = Total / sum(Total) * 100
)
pivot_percents_totals = pivot_percents |>
group_by(offence_simple = "Total") |>
summarise_if(is.numeric, sum)
pivot_combined = bind_rows(pivot_percents, pivot_percents_totals)
percent_careless = pivot_combined |> filter(offence_simple == "Careless driving") |> pull(`Total (%)`)
percent_inconsiderate = pivot_combined |> filter(offence_simple == "Inconsiderate driving") |> pull(`Total (%)`)
percent_careless_cycling = pivot_combined |> filter(offence_simple == "Careless driving") |> pull(`Cyclist (%)`)
percent_inconsiderate_cycling = pivot_combined |> filter(offence_simple == "Inconsiderate driving") |> pull(`Cyclist (%)`)
percent_other = pivot_combined |> filter(offence_simple == "Other") |> pull(`Total (%)`)
percent_other_cyclist = pivot_combined |> filter(offence_simple == "Other") |> pull(`Cyclist (%)`)
percent_other_driver = pivot_combined |> filter(offence_simple == "Other") |> pull(`Driver (%)`)
percent_other_other = pivot_combined |> filter(offence_simple == "Other") |> pull(`Other (%)`)
```
A cross-tabulation of transport mode (of the person submitting video evidence) and the offence type is shown in @tbl-mode-offences-crosstab.
For both variables it shows the two main categories plus an 'other' category.
While `r round(percent_inconsiderate, 1)` percent of all offences are for driving without reasonable considered to others (rt88576), they make up the bulk of offences reported by cyclists (`r round(percent_inconsiderate_cycling, 1)`%).
Drivers are proportionally `r round(percent_other_driver / percent_other_cyclist, 1)` times more likely to report other types of offences as cyclists, while other reporting modes are most likely to report other types of offences, being `r round(percent_other_other / percent_other_cyclist, 1)` times more likely to report other types of offences as cyclists.
While further research is needed to understand the reasons for these tendencies, the results match intuition.
Physically-vulnerable cyclists are understandably most concerned with the dangerous driving of vehicles, whereas drivers tend to focus on other types of road traffic offence.
Pedestrians and other reporting modes were also relatively more likely to encounter other types of offence.
```{r}
#| label: tbl-mode-offences-crosstab
#| tbl-cap: Mode of transport of person submitting video (columns).
# pivot_combined |>
# knitr::kable(digits = 1)
tbl = pivot_combined |>
select(-`Total (%)`) |>
filter(offence_simple!="Total") |>
select(-c(Cyclist:Other)) |>
pivot_longer(cols=`Cyclist (%)`:`Other (%)`, names_to="mode", values_to="perc") |>
mutate(
mode=factor(str_extract(mode, "^[^\ ]+"), levels=c("Driver", "Cyclist", "Other")),
offence_simple=factor(offence_simple, levels=c("Inconsiderate driving", "Careless driving", "Other"))
) |>
group_by(offence_simple) |>
summarise(perc=list(round(perc)), count=first(Total)) |>
gt() |>
gt_plt_bar_stack(perc, width=65, labels = c(" Cyclists ", " Drivers ", " Other "),
palette= c("#e31a1c", "#1f78b4", "#bdbdbd")) |>
cols_width(1 ~ px(180), 2 ~ px(400), 3 ~ px(150)) |>
tab_style(
style = cell_text(size=px(18)),
locations = cells_body(columns = c(offence_simple, count))
) |>
gt_theme_espn()
# gt::gtsave(tbl, "tbl-mode-offences-crosstab.png")
knitr::include_graphics("tbl-mode-offences-crosstab.png")
```
```{r}
#| label: tbl-offences-cyclist-observer
#| tbl-cap: "Number and percentages of OpSnap records, submitted by cyclists, by offence type."
#| include: false
d_all |>
opsnap:::filter_offence_nas() |>
filter(offence != "n/a") |>
filter(mode == "cyclist") |>
count(offence, sort = TRUE) |>
mutate(percent_of_records = n / nrow(d_offence)) |>
mutate(percent_of_records = scales::percent(round(percent_of_records, 3))) |>
mutate(
offence = ifelse(n < 20, "other", offence)
) |>
group_by(offence) |>
summarise(n = sum(n), n_hybrid = n()) |>
arrange(n_hybrid, desc(n)) |>
select(-n_hybrid) |>
mutate(`% of total` = scales::percent(n / sum(n), accuracy = 0.1)) |>
rename_all(snakecase::to_title_case) |>
knitr::kable()
perc_no_further_action = d_all |>
count(disposal) |>
filter(disposal == "nfa") |>
pull(n) / nrow(d_all) * 100
perc_course = d_all |>
count(disposal) |>
filter(disposal == "course") |>
pull(n) / nrow(d_all) * 100
```
Disposal categories assigned by police are shown in @tbl-disposal.
Roughly a third of cases (`r round(perc_no_further_action, 1)`%) resulted in no further action and, for most of the remainder drivers were required to undertake an education course.
Conditional offers, that is, drivers being offered a reduced penalty for admitting guilt, were the third most common outcome.
Nearly two percent of cases went to court and a further one percent underwent further investigation.
```{r}
#| label: tbl-disposal
#| tbl-cap: "Most common disposal values in the OpSnap dataset."
# d_all |>
# count(disposal, sort = TRUE) |>
# mutate(percent_of_records = round(n / nrow(d_all), 3) * 100) |>
# arrange(desc(n)) |>
# rename_all(snakecase::to_title_case) |>
# knitr::kable()
tbl_disposal = d_all |>
count(disposal, sort = TRUE) |>
mutate(percent_of_records = round(n / nrow(d_all), 3) * 100) |>
arrange(desc(n))
tbl_disposal_totals = tbl_disposal |>
summarise(n = sum(n), percent_of_records = round(sum(percent_of_records))) |>
mutate(disposal = "Total")
tbl = bind_rows(tbl_disposal, tbl_disposal_totals) |>
rename_all(snakecase::to_title_case) |>
# Rename N to "Number of records"
rename(`Number` = N, `Percent` = `Percent of Records`) |>
gt() |>
gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |>
cols_width(1 ~ px(400), 2 ~ px(80)) |>
tab_style(
style = cell_text(size=px(18)),
locations = cells_body(columns = Disposal)
) |>
gt_theme_espn()
# gt::gtsave(tbl, "tbl-disposal.html")
# webshot2::webshot("tbl-disposal.html", "tbl-disposal.png")
knitr::include_graphics("tbl-disposal.png")
```
There were `r unique(d_with_location$location) |> length()` unique locations (addresses) in the data, with the most common locations corresponding to busy roads: Meanwood Road (Leeds), Dewsbury Road (Wakefield) and Chapeltown Road (Leeds), with no single address accounting for more than 0.4% of records.
The locations were scrambled by West Yorkshire Police before being made available for data protection purposes, which meant spatial analysis would only possible for ‘all’ video submissions, ignoring different types of road users or offence types.
```{r}
#| echo: false
#| eval: false
# #| label: tbl-locations
# #| tbl-cap: "Most common locations recorded in the OpSnap dataset"
# d_with_location |>
# count(location, sort = TRUE) |>
# mutate(percent_of_records = round(n / nrow(d_with_location), 3) * 100) |>
# arrange(desc(n)) |>
# head(10) |>
# rename_all(snakecase::to_title_case) |>
# knitr::kable()
# if (!file.exists("tbl-locations.png")) {
# tbl = d_with_location |>
# count(location, sort = TRUE) |>
# mutate(percent_of_records = round(n / nrow(d_with_location), 3) * 100) |>
# arrange(desc(n)) |>
# head(10) |>
# rename_all(snakecase::to_title_case) |>
# rename(`Number` = N, `Percent` = `Percent of Records`) |>
# gt() |>
# gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |>
# cols_width(1 ~ px(400), 2 ~ px(80)) |>
# tab_style(
# style = cell_text(size=px(18)),
# locations = cells_body(columns = Location)
# ) |>
# gt_theme_espn()
# gt::gtsave(tbl, "tbl-locations.png")
# } else {
# knitr::include_graphics("tbl-locations.png")
# }
# knitr::kable(cars)
```
# Discussion
Drawing on the work and findings described above, this discussion outlines a research agenda.
The breadth and depth of potential policy-relevant work that could be undertaken is, we suggest, enormous.
While this preliminary list will not prove exhaustive, if it stimulates or informs further research then it will have achieved its objective.
There has been an increase in submissions to West Yorkshire Police's OpSnap system since it was set-up in 2021.
This may reflect changing levels of road use during and after the Covid-19 pandemic, increased awareness of OpSnap, or increased video camera ownership levels.
Comparative analysis including other police jurisdictions should be undertaken to determine whether this experience is common or isolated.
We hypothesise that it is a common experience for police forces (and other organisations) setting up public video submissions reporting antisocial and dangerous driving and that that the rate of submissions will continue to increase (longer term, ideally, submission rates decline as roads become safer).
This was a case study of West Yorkshire but each of the 43 police forces in England and Wales collates OpSnap data.
Our preliminary scoping of individual force websites suggests OpSnap data is publicly available nationwide.
A national dataset should be developed that includes OpSnap data from all police forces.
A feasibility study should be undertaken to establish cross-force data availability and compatibility.
A national dataset holds potential for national-level, cross-regional, and comparative analyses of patterns and trends.
It would have the potential to promote cross-national comparative analysis and international cooperation in road safety should similar data be collated elsewhere.
It would hold the potential for the development of rankings according to different criteria and, thereby, potential performance-related metrics [@tiwana2015].
There will be added-value from a national dataset which allows identification of cross-jurisdictional issues such as the same vehicles or persons involved in incidents in different police force areas.
With submission volumes likely to continue to increase, means of informing triage and prioritisation of cases will be increasingly valuable.
Triage systems are almost certainly used already, and research to identify best practice should be undertaken.
Similarly, research to determine best practice in determining disposal recommendations should be undertaken.
It may provide feasible, perhaps using machine learning, to develop automated means of triage, determination of offence type and disposal.
Research to gauge the differential effectiveness of disposal measures is also needed.
Within the trend of increasing video submissions there were distinct seasonal patterns.
Seasonal variation in submissions reflect seasonal (weather and other) influences, and these will vary by transport mode: cycling, for example, is less prevalent in winter.
This suggests seasonal variation in preventive responses might be tailored to need.
Future research should focus on different types of problem concentration.
It is well established that crime is highly concentrated along whatever dimension is examined [@farrellPreventingRepeatRepeat2017].
With respect to video evidence this will include recidivists (repeat offenders), repeat submitters of evidence (who may or may not be repeat victims/survivors), the concentration of incidents (close passes, near misses and crashes) at certain times and places, with different types of experiences concentrated among certain types of road users.
As discussed in the introduction, close passes are more likely to be reported by cyclists and horse riders than vehicle drivers, both due to the nature of the road user interactions and the relative risk associated with the maneuver.
There was preliminary confirmation of this pattern in our analysis which found that offence types reported by vehicle drivers are systematically different to those of other road users.
Hence future research should consider that investigative and preventive approaches should be tailored to different contexts and types of road user.
Rural roads account for a disproportionate amount of fatalities @brakeDirectLineBrake2018, and further research using video evidence may inform preventive approaches tailored to road type.
Policing and prevention efforts focused on where problems are concentrated are more resource efficient and is the foundation of problem-oriented policing [@laingAggressiveDriving2010; @scottSpeedingResidentialAreas2010].
Studies focused on specific types of road user will prove informative with respect to policy and practice.
Dash-cam submissions by motorists offers great potential to inform other aspects of road safety.
Research into submissions by pedestrians, horse riders and motorcyclists should prove viable to improve the safety of these parties.
There were over four hundred submissions by horse riders in West Yorkshire, which means there will be thousands nationwide.
The British Horse Society[^1] and others may be interested in this data being used to inform the safety of horses and riders.
The British Motorcyclists Federation[^2] may be interested in research promoting the safety of its members and in why motorcyclist submissions are relatively infrequent
.
[^1]: https://www.bhs.org.uk/
[^2]: https://www.britishmotorcyclists.co.uk/
As an example, cyclists undertake two percent of road miles in the UK [@departmentfortransport].
Other things equal, this suggests that with respect to distance travelled, cyclists were around 20 times more likely to submit evidence of a road traffic offence than vehicle drivers and other road users in West Yorkshire.
Over 90 percent of offences reported by cyclists were for driving without consideration for other road users and without due care and attention.
This is consistent with the phenomenon of 'close passes' in the cycling safety literature [@aldred2016; @cubbin2024].
It suggests the OpSnap data holds the potential for further analysis to inform knowledge about the nature of close passes generally.
For example, it should be possible to identify hotspot locations.
As with much of the further research outlined here, this the potential to inform preventive policy and practice relating to police interventions, driver behaviour, the design of roads and roadside environments.
Comparison by complainant mode of transport, reported offense type and disposal, subdivided by other characteristics, is needed.
Determining which type of submission - by which type of road user for example — is more likely to result in a recommendation of court proceedings, may help refine police investigations.
For example, are submissions by cyclists (or horse riders) more or less likely to result in court proceedings than those by motorists?
If so, why?
What is the role of substantive issues, and to what extent are decisions affected by video quality?
Further work with police investigators should inform best practice in the processing and further investigation of submissions.
How are disposal options determined?
Cross-police force comparative analysis may inform national best practice guidelines.
A constraint on the present analysis was the nature of the publicly-available data.
A pilot project should be undertaken in collaboration with police to establish the potential to further enhance policing and public safety using non-public aspects of the submissions data.
This would require working partnerships and a secure research platform to meet GDPR requirements and Data Protection Act (2018).
Such collaborative approaches are increasingly common in health, medicine, and policing research.
Video footage holds significant potential for further analysis, both qualitative and quantitative.
Police investigations might be improved by research to identify and promote good practice in the assessment of footage, its use in determining disposals, further investigations, and prosecutions.
What type of footage works best in the courtroom, and how is it best identified and prepared?
What is the potential for machine learning to identify, clean, and prepare footage of the most serious submissions?
There are also likely to be lessons that can be learned for how footage is gathered, edited and submitted by road users.
Analysis of video footage should be undertaken to identify risk factors, that is, the types of situations in which crashes, near misses, close passes and other offences occur.
Such research can inform policy and practice in ways that ameliorate risk.
Within this area of research, analysis from dash cams is likely to inform different practices than that from cycle-cams, that from pedestrians, horse riders and motorcyclists, and so on.
Some research into the relationship between video footage and supporting written evidence is needed.
Is written supported evidence always needed?
Which is deemed most important by police, and which by courts?
What are the characteristics of strong supporting written evidence, and what are the characteristics of strong video evidence?
Do both need to be ‘strong’ or can a weakness in one be overcome by particularly strong aspects in the other?
Research should identify further aspects of good practice for those submitting evidence to police.
The spatial analysis undertaken here was obliged to use data for 'all' video submissions.
We were unable to cross-tabulate the geographic location of submissions by different road users because, in the publicly-available data, locations are not be matched to individual cases: they were scrambled to ensure anonymity for GDPR purposes.
An obvious next step, in the context of a secure research platform, would be spatial analysis for different types of road users, for different types of incidents, for incidents resulting in different disposals, and so on.
Hotspots and spatial clustering are likely to vary by type of road user, type of incident, day or week, time of day, and so on.
Some research into the relationship between video footage and supporting written evidence is needed.
Is written supported evidence always needed?
Which is deemed most important by police, and which by courts?
What are the characteristics of strong supporting written evidence, and what are the characteristics of strong video evidence?
Do both need to be ‘strong’ or can a weakness in one be overcome by particularly strong aspects in the other?
Research should identify further aspects of good practice for those submitting evidence to police.
We did not include spatial analysis here, but the data hold that potential.
The public data ued here only allowed spatial analysis for ‘all’ video submissions.
That is, we were unable to distinguish the geographic location of submissions by different types of road users because, or for different offence types, because the locations were scrambled for data protection purposes.
Future research, in the context of a secure data platform, should include spatial analysis for different types of road users, for different types of incidents, for incidents resulting in different disposals.
Hotspots and spatial clustering will vary by type of road user, type of incident, day or week, time of day, and so on, and parsing the data will produce more informed road safety research.
Accessing OpSnap data in collaboration with police is likely to prove the most fruitful approach for future research.
Were that impractical, research using existing online videos should be undertaken.
There are many thousands of such videos on social media in the public realm, including those posted by police forces, and while there are different sampling issues to consider, this sources offers a plausible alternative route to informing road safety.
OpSnap data should be compared to, and integrated with, other road use and road safety datasets.
Here we offered preliminary comparison to the volume of Stats19 data on road crashes.
patial analysis integrated with Strava data (road use data by cyclists and pedestrians), Google road use and other data, may facilitate exposure-based measures rather than the counts used here.
The use of rates will enhance the identification of locations with higher risk other than that due to volume of traffic.
Future research should recognise that OpSnapdata holds the potential for use in the evaluation of experimental interventions.
It may offer the potential for pre-post intervention comparative evaluations using control sites.
Different road safety interventions imply different resource needs.