paper.qmd

---
title: "Road User video evidence of road traffic offences: Preliminary analysis of Operation Snap data and suggestions for a research agenda"
bibliography: [references.bib, opsnap.bib]
editor:
  markdown: 
    wrap: sentence
# keywords: 
#   - Road safety
#   - Video evidence
#   - Near misses
#   - Operation Snap
#   - Dangerous driving
#   - Antisocial driving
date: last-modified
format:
  # html: default
  # docx: default
  # pdf: default
  arxiv-pdf:
    keep-tex: true
# linenumbers: true # Add (continuous) line numbers?
# doublespacing: false # Double space the PDF output?
# runninghead: "Preprint" # The text on the top of each page of the output
# authorcols: true # Should authors be listed in a single column (default) or in multiple columns (`authorcols: true`)
execute: 
  echo: false
  warning: false
  cache: false
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)
```

```{r}
#| eval: false
#| echo: false
# quarto::quarto_render("paper.qmd", output_format = "pdf")
# file.copy("paper.pdf", "~/OneDrive/opsnap_leeds/paper-no-line-numbers-v12.pdf")
# quarto::quarto_render("paper.qmd", output_format = "arxiv-pdf")
# file.copy("paper.pdf", "paper-v12.pdf")
# file.copy("paper.pdf", "~/OneDrive/opsnap_leeds/paper-v12.pdf")
# quarto::quarto_render("paper.qmd", output_format = "docx")
# # file.copy("paper.docx", "~/OneDrive/opsnap_leeds/paper-v12.docx")
# file.copy("paper.docx", "paper-v12.docx")
# quarto::quarto_render("title.qmd", output_format = "arxiv-pdf")
# browseURL("title.pdf")
# system("gh release upload v1 --clobber paper-v12.docx")
# browseURL("paper.docx")
# # Install extension:
# system("quarto install extension mikemahoney218/quarto-arxiv")
# system("sudo apt install lmodern")
```

```{r setup}
#| include: false
devtools::load_all()
```

```{r}
#| include: false
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(tidyverse)
library(gt)
library(gtExtras)
remotes::install_github("ITSLeeds/opsnap")
```

```{r}
#| include: false
#| echo: false
devtools::load_all()
```

```{r}
#| echo: false
#| eval: false
library(rvest)
query = ".file-link a"
url = "https://www.westyorkshire.police.uk/SaferRoadsSubmissions"
files_available = read_html(url) |>
  html_nodes(query) |>
  html_text()
links_available = read_html(url) |>
  html_nodes(query) |>
  html_attr("href")
urls = paste0("https://www.westyorkshire.police.uk", links_available)
file_names = basename(links_available)
tibble::tibble(file_names) |>
  knitr::kable()
```

```{r}
#| include: false
# In preparation for reading-in all data:
# date_str = format(Sys.Date(), "%Y-%m")
date_str = format(as.Date("2024-04-01"), "%Y-%m")
file_name = paste0("data/west-yorkshire/operation_snap_", date_str, ".csv")
```

```{r}
#| eval: false
#| echo: false
# Download them to raw_data/west-yorkshire:
dir.create("raw_data/west-yorkshire", recursive = TRUE, showWarnings = FALSE)
pbapply::pblapply(urls, function(u) {
  download.file(u, paste0("raw_data/west-yorkshire/", basename(u)))
})

# Test with 2nd url:
u = urls[2]
d2 = opsnap:::download_and_read(u)
d_all = purrr::map_df(urls, opsnap:::download_and_read)
dir.create("data/west-yorkshire", recursive = TRUE, showWarnings = FALSE)
# Date to nearest month:
write_csv(d_all, file_name)
```

<!-- The data is open acess and looks like this, with names cleaned up by the package: -->

```{r}
#| eval: false
#| echo: true
#| include: false
u = "https://www.westyorkshire.police.uk/sites/default/files/2024-01/operation_snap_oct-dec_2023_0.xlsx"
d = opsnap:::download_and_read(u)
# Old names:
#  [1] "REPORTER TRANSPORT MODE" "OFFENDER VEHICLE MAKE"  
#  [3] "OFFENDER VEHICLE MODEL"  "OFFENDER VEHICLE COLOUR"
#  [5] "OFFENCE"                 "DISTRICT"               
#  [7] "DISPOSAL"                "DATE OF SUBMISSION"     
#  [9] "...9"                    "OFF LOCATION"
# New names:
# [1] "mode"     "make"     "model"    "colour"   "offence"  "district" "disposal"
# [8] "date"     "location"
```

<!-- The data looks like this (first 3 rows shown): -->

```{r}
#| echo: false
#| include: false
file_exists = file.exists(file_name)
if (!file_exists) {
  # stop("try downloading the data first!")
  u = "https://github.com/ITSLeeds/opsnap/releases/download/v1/operation_snap_2024-04.csv"
  f = basename(u)
  download.file(url = u, destfile = f)
  dir.create("data/west-yorkshire", recursive = TRUE)
  file.copy(f, "data/west-yorkshire/")
}
d_all = read_csv(file_name)
d_all = d_all |>
  mutate(
    mode = tolower(mode),
    offence = tolower(offence),
    disposal = tolower(disposal),
    # First word of text in offence column:
    offence_code = str_extract(offence, "\\w+")
  )
d_all |>
  head(3) |>
  # Convert column names to title case:
  rename_all(snakecase::to_title_case) |>
  # snakecase::to_title_case
  knitr::kable()
```

```{r}
d_all_monthly = d_all |>
  mutate(month = lubridate::floor_date(date, "month")) |>
  group_by(month) |>
  summarise(n = n()) |>
  mutate(records = "all")
d_offence = d_all |>
  opsnap:::filter_offence_nas()
# table(d_offence$offence) |> sort() |> tail()
#                         rv86019 use a handheld phone / device whilst driving a motor vehicle on a road 
#                                                                                                    357 
#                                  rt88966 motor vehicle fail to comply with endorsable s36 traffic sign 
#                                                                                                    411 
#                                                          rt88971 fail to comply with red traffic light 
#                                                                                                    679 
# rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals 
#                                                                                                   1364 
#                                                           rt88575 drive without due care and attention 
#                                                                                                   2917 
#                                               rt88576 drive without reasonable consideration to others 
#                                                                                                   4992 
d_offence_monthly = d_offence |>
  mutate(month = lubridate::floor_date(date, "month")) |>
  group_by(month) |>
  summarise(n = n()) |>
  mutate(records = "with_offence")
d_with_location = d_all |>
  opsnap:::filter_location_nas()
d_complete = d_offence |>
  opsnap:::filter_location_nas()
d_complete_monthly = d_complete |>
  mutate(month = lubridate::floor_date(date, "month")) |>
  group_by(month) |>
  summarise(n = n()) |>
  mutate(records = "complete")
d_monthly = bind_rows(d_all_monthly, d_complete_monthly)
# d_monthly = bind_rows(d_all_monthly, d_offence_monthly, d_complete_monthly)
```

```{r}
#| label: stats19
#| include: false
# years = 2021:2022
# dir.create("data/stats19", recursive = TRUE, showWarnings = FALSE)
# collisions_2022 = stats19::get_stats19(year = 2022)
# collisions_2021 = stats19::get_stats19(year = 2021)
# collisions_2020 = stats19::get_stats19(year = 2020)
# collisions = bind_rows(collisions_2020, collisions_2021, collisions_2022)
# write_csv(collisions, "collisions_2020-2022.csv")
# piggyback::pb_upload("collisions_2020-2022.csv")
system("gh release download v1")
collisions = read_csv("collisions_2020-2022.csv")
stats19_monthly = collisions |>
  filter(police_force == "West Yorkshire") |>
  mutate(month = lubridate::floor_date(date, "month")) |>
  group_by(month) |>
  summarise(n = n()) |>
  mutate(records = "stats19")

table(d_all$mode)

table(d_all$mode)

          # cyclist       horse rider      motorcyclist        pedestrian 
          #    7069               456                50              1467 
          # unknown    vehicle driver vehicle passenger 
          #     526             10145               650 

perc_vehicle_driver = d_all |>
  count(mode) |>
  filter(mode == "vehicle driver") |>
  pull(n) / nrow(d_all) * 100 
perc_cyclist = d_all |>
  count(mode) |>
  filter(mode == "cyclist") |>
  pull(n) / nrow(d_all) * 100 
perc_pedestrian = d_all |>
  count(mode) |>
  filter(mode == "pedestrian") |>
  pull(n) / nrow(d_all) * 100
perc_horse = d_all |>
  count(mode) |>
  filter(mode == "horse rider") |>
  pull(n) / nrow(d_all) * 100
perc_motorcyclist = d_all |>
  count(mode) |>
  filter(mode == "motorcyclist") |>
  pull(n) / nrow(d_all) * 100
perc_other = 100 - perc_vehicle_driver - perc_cyclist - perc_pedestrian - perc_horse - perc_motorcyclist

# table(d_all$disposal)
#  conditional offer              court dsit investigation educational course 
#               2563                321                202              10887 
#               fine                nfa  rpu investigation 
#                  1               6365                 24 

perc_disposal_educational_course_conditional = d_all |>
  mutate(disposal = case_when(
    disposal == "conditional offer" ~ "educational course",
    TRUE ~ disposal
  )) |>
  count(disposal) |>
  filter(disposal == "educational course") |>
  pull(n) / nrow(d_all) * 100
# Just educational:
perc_disposal_educational = d_all |>
  count(disposal) |>
  filter(disposal == "educational course") |>
  pull(n) / nrow(d_all) * 100
perc_disposal_conditional = d_all |>
  count(disposal) |>
  filter(disposal == "conditional offer") |>
  pull(n) / nrow(d_all) * 100
perc_disposal_court = d_all |>
  count(disposal) |>
  filter(disposal == "court") |>
  pull(n) / nrow(d_all) * 100
perc_undergoing_further_investigation = d_all |>
  count(disposal) |>
  filter(disposal == "undergoing further investigation") |>
  pull(n) / nrow(d_all) * 100
perc_no_further_action = d_all |>
  count(disposal) |>
  filter(disposal == "no further action") |>
  pull(n) / nrow(d_all) * 100
perc_course = d_all |>
  count(disposal) |>
  filter(disposal == "conditional offer") |>
  pull(n) / nrow(d_all) * 100
date_range = range(d_all$date)
# Crashes in 2024:
d_all_2024 = d_all |>
  filter(date >= "2024-01-01")
table(d_all_2024$date)
```

# Abstract {.unnumbered}

This study uses data from Operation Snap (OpSnap), the UK police's national system to receive road users' video evidence of road traffic offences.
Data from one police force area for 39 months (January 2021 to March 2024) <!-- (N= 18,363 records) --> (N = `r nrow(d_all) |> scales::number(big.mark = ",")` records) is analysed.
<!-- Of submitted cases, 49.9% were from vehicle drivers, 34.4% were from cyclists, and 7.4% were from pedestrians. --> Half were submitted by vehicle drivers (`r round(perc_vehicle_driver, 1)`%), a third by cyclists (`r round(perc_cyclist, 1)`%), `r round(perc_pedestrian, 1)`% by pedestrians, `r round(perc_horse, 1)`% by horse riders, `r round(perc_motorcyclist, 1)`% by motorcyclists, and `r round(perc_other, 1)`% were unknown.
We estimate that, relative to road distance travelled, cyclists were 20 times more likely to submit video evidence than vehicle drivers.
The most common offences overall were driving 'without reasonable consideration to others' or 'without due care and attention'.
<!-- Two thirds (66.1%) of reported cases resulted in the recommended disposal of an educational course (including conditional offers), 31% no further action, and fewer than 1% for court appearance. --> Half (`r round(perc_disposal_educational, 1)`%) of reported cases resulted in the recommended disposal of an educational course, `r round(perc_no_further_action, 1)`% no further action `r round(perc_disposal_conditional, 1)`% conditional offer, and `r round(perc_disposal_court, 1)`% resulted in court appearance.
A research agenda using OpSnap data is outlined that could emerge if national datasets are compiled and responsibly opened-up and made available for research and policy-making: data-driven research should identify hotspot locations and other correlates of dangerous and antisocial road use at regional, and local levels; research projects should investigate disposal-related decision-making, video quality, and the role of supporting evidence; offence concentration (recidivism, repeat submitters of evidence, spatial hotspots) and case progression including court cases should be explored with reference to new video evidence.
We conclude that datasets derived from publicly-uploaded video submission portals have the potential to transform evidence-based policy and practice locally, nationally and internationally. 

# Introduction

Dangerous and criminal driving are significant problems that take many forms [@simon1996; @corbett2003; @corbett2010].
There were 1,711 fatalities and 135,480 causalities across all severity categories (slight, serious fatal) in Britain resulting from car crashes and other road traffic collisions reported by the police in 2022 [@departmentfortransportReportedRoadCasualties2023].
This is just the tip of the iceberg of road traffic incidents, with many more near misses and minor incidents going unreported.

Traditionally, road safety research has relied on retrospective analysis of crashes to assess and enhance road safety [@towards2008].
However, relying on historical crash data poses several challenges.
Crashes are rare events [@elvik2006], requiring extended periods, often at least five years, of data to obtain statistically significant estimates [@songchitruksa2006].
Furthermore, the reactive nature of crash analysis means that safety improvements only follow after crashes occur, which is both inefficient and ethically problematic.

Surrogate safety measures offer an alternative to casualty data by using more frequently observable, less severe, traffic events to identify road safety issues [@songchitruksa2006].
One prominent surrogate measure is traffic conflicts [@lord2021].
A traffic conflict occurs when road users’ paths intersect with a collision risk if no action was taken [@tarko2018].
The use of traffic conflicts is illustrated by Hyden’s safety pyramid [@hyden1987], which represents the spectrum of traffic events from safe, undisturbed passages at the base, to severe, rare crashes at the top.
Hyden’s model demonstrates the inverse relationship between crash frequency and severity.
Understanding this relationship allows for the prediction of severe crashes based on the more common, less severe conflicts.

Traditional methods for observing traffic conflicts have relied on manual observation, which is resource intensive.
The present paper highlights the potential of using open access data from Operation Snap, the UK police’s national system to receive road users’ video evidence of road traffic offences as a surrogate measure of road safety.

<!-- Most injuries involve motor vehicle occupants, however when accounting for the distance travelled by road user groups, cyclists are over-represented in injury statistics [@departmentfortransportReportedRoadCasualties2023].
Between 2004 and 2022, an average of 104 cyclists were killed and 4,212 were seriously injured each year in Britain according to police records [@departmentfortransport].
Almost half of cyclist fatalities involved collision with a car, with 56% on rural roads (compared to 30% of traffic).
The 2023 report noted that "the most common contributory factor allocated to pedal cyclists in fatal or serious collisions (FSC) with another vehicle was 'driver or rider failed to look properly'" [@departmentfortransport]. -->
<!-- The context for the study is that, between 2004 and 2022, an average of 104 cyclists were killed and 4,212 were seriously injured each year according to official records recorded by police forces across Great Britain [@departmentfortransport].
Almost half of cyclist fatalities involved collision with a car, with 56% on rural roads (compared to 30% of traffic).
The 2023 report noted that “the most common contributory factor allocated to pedal cyclists in fatal or serious collisions (FSC) with another vehicle was ‘driver or rider failed to look properly’” [@departmentfortransport].  -->

In what follows, it emerges that video submissions are, relative to road distance travelled, 20 times disproportionately reported by cyclists.
Cyclist crashes are underreported in police recorded datasets [@elvikIncompleteAccidentReporting1999] and the extent of injuries sustained by cyclists may be higher when hospital-recorded cases are counted [@janstrupetal.UnderstandingTrafficCrash2016].
Under-reporting in police datasets is likely to be greatest for minor injuries, while records are seldom kept when a collision or injury is avoided due to riders or drivers taking evasive actions [@ibrahimCyclingMissesReview2021].
These incidents are often referred to as "near-misses" [@ibrahimCyclingMissesReview2021]

Commuter cyclists in the UK experience a near miss for every six miles of riding [@aldredInvestigatingRatesImpacts2015a], and concern over near misses is a key reason people choose not to cycle [@sandersPerceivedTrafficRisk2015a].
Near misses are associated with inattentive driving, aggressive driving, driving too fast, passing too close, being car-doored, and being cut off by turning drivers [@sandersPerceivedTrafficRisk2015a; @cubbin2024].
A review of near-miss cycling crashes highlighted the need for better data to inform safety research [@ibrahimCyclingMissesReview2021].

Close passes are the most common type of near miss reported by cyclists and are associated with collisions resulting in injury [@aldred2016].
A 'close pass' refers to when a vehicle passes too close to a cyclist, which is defined in the UK as less than 1.5 metres away at 30mph (50kph).
<!-- Close passes take different forms including the 'punishment pass' by angry drivers for a perceived slight such as causing the driver to slow down [@cubbin2024]. --> There is no specific law in the UK Road Traffic Act 1988 for driving too close to a cyclist, but two are commonly applied for careless driving: RT88575, driving without due care and attention; and RT99576, driving without reasonable consideration to others - and these are prominent in the analysis that follows. 

Operation Snap, often referred to informally as 'OpSnap', was piloted by North Wales police in October 2016 and adopted by all Welsh forces by 2018.
It is now in operation nationally across England and Wales, each police force offering its own submission portal for road users to submit video evidence.
The nature of video submissions and the related expectations were summarised on the website of one Police and Crime Commissioner as follows:   

-   The secure form is for traffic offences, it is NOT for submitting footage of road traffic collisions, any other offences or for parking issues. 
-   The car registration number of the offending vehicle must be clearly visible. 
-   The public should be prepared to sign a witness statement and possibly give evidence in court. 
-   Statements for OpSnap can only be accepted from persons aged 18 or over. If you are under 18 the incident should be reported by email. 

This is, to our knowledge, the first research study to use this dataset.
As such, the study is offered as proof-of-concept of the potential for its further analysis to improve road safety.
Following analysis and discussion of three years of data for one police force area, the study outlines a research agenda designed to inform policy and practice. 

# Methods and data {#sec-methods}

Open access data from West Yorkshire Safer Roads from the OpSnap project was used for this study.
The media submissions portal opened in July 2020, and available data from the West Yorkshire Police (WYP) used in this paper span the calendar years 2021, 2022 and 2023.
For the year 2021 there was less than half the cases of either 2022 or 2023, which could reflect reduced road use during the COVID-19 pandemic and fewer people being aware of OpSnap when it commenced.

The dataset is a tabulation of cases submitted to the West Yorkshire Safer Roads OpSnap web portal.
The terms 'record' and 'case' are used interchangeably here to refer to a record in the OpSnap database, each of which represents the separate submission of video evidence by a road user.

The portal allows members of the public to submit video footage of suspected traffic offences committed by motor vehicle drivers.
Video footage is commonly recorded from on-board cameras.
For motor vehicles these cameras are typically mounted on or near the vehicle front dashboard, known as 'dash-cams'.
One source examined GB Driving Licence data to find that, by early 2024, close to a third of private and commercial vehicle in the UK had a dash-cam [@DashCamSubmissions2024].
For cyclists, footage is commonly recorded using helmet or handlebar-mounted cameras.
The proportion of cyclists using these cameras is unknown but anecdotal evidence suggests increased usage, with many choosing to record rides in case an incident occurs.
The proportions of horse riders, pedestrians and motor cyclists recording video footage is, to our knowledge, also unknown.

Complainants upload footage and complete a short form that includes their personal details, the details of the vehicle involved including registration, make, model and colour, the location and time of the incident, and details of the camera used to record the footage.
Only vehicle offences can be reported as the registration number of any offending vehicle is required and must be legible in footage.
The open access data is a deidentified summary of submitted cases with information on mode of transport of the person reporting, offender vehicle details (make, model, colour), offence code, recommended disposal, date of submission, district, and offence location.
The offence location is typically a street name and town or city name, or an intersection, and examples included: 'A58 Godley Road, Halifax', 'Keighley Road, Silsden', 'Woodhouse Lane A660, Leeds'.
For this study, approximate geolocations were obtained using Google API, restricting cases to within West Yorkshire.
Further aspects of the data, their uses and limitations are discussed in what follows.

A random sample of 5 records from the raw data is shown in @tbl-raw (note: "nfa" refers to "no further action").

```{r}
#| label: tbl-raw
#| tbl-cap: "Random sample of 5 records from the raw data."
set.seed(24)
d_all |>
  sample_n(5) |>
  select(-district, -`offence_code`) |>
  mutate(date = as.Date(date)) |>
  arrange(date) |>
  rename_all(snakecase::to_title_case) |>
  knitr::kable()
  
  # # For PDF:
  # #  |>
  # # Striped styling and tiny text to fit:
  # kableExtra::kable_styling(latex_options = "striped") |>
  # # Set column widths:
  # kableExtra::column_spec(1, width = "3em") |>
  # kableExtra::column_spec(2, width = "3em") |>
  # kableExtra::column_spec(3, width = "3em") |>
  # kableExtra::column_spec(4, width = "3em") |>
  # kableExtra::column_spec(5, width = "9em") |>
  # kableExtra::column_spec(6, width = "4em") |>
  # kableExtra::column_spec(8, width = "8em")
```

# Results

There were `r nrow(d_all) |> scales::number(big.mark = ",")` records in the dataset for the three-year study period, with a strong upward trend, as shown in the monthly counts presented in @fig-time.
Since early 2022, there have been more monthly records in the OpSnap data than in the official 'STATS19' road traffic collision records for West Yorkshire, highlighting the under-reporting of road traffic incidents in official statistics.
STATS19 records are from the Department for Transport's database of road traffic collisions reported to the police and only include incidents that result in injury.
Like OpSnap data, STATS19 records are open access.
For the results presented in @fig-time, STATS19 datasets were downloaded with the `stats19` R package [@lovelace2019] and filtered to include only records from West Yorkshire.

```{r}
#| label: fig-time
#| fig-cap: "Monthly count of Operation Snap (complete and with offence and location data, red and green) and official STATS19 road traffic collision records (blue), West Yorkshire."
d_monthly = bind_rows(d_monthly, stats19_monthly)
d_monthly |>
  mutate(records = case_when(
    records == "all" ~ "OpSnap (all)",
    records == "complete" ~ "OpSnap (complete)",
    records == "stats19" ~ "STATS19"
    )
  ) |>
  rename_all(snakecase::to_title_case) |> 
  ggplot() +
  geom_line(aes(Month, N, colour = Records), alpha = 0.5, size = 2) +
  # geom_smooth(aes(month, n, colour = records), method = "lm", se = FALSE) +
  labs(
    # title = "Number of monthly records in West Yorkshire Police\nOperation Snap data",
       x = "Date",
       y = "Number of records per month") +
  theme_minimal()
```

Some records lacked either an offence (`r format(nrow(d_all) - nrow(d_offence), big.mark = ",", scientific = FALSE)`, `r round((nrow(d_all) - nrow(d_offence)) / nrow(d_all) * 100, 1)`%) or a location (`r format(nrow(d_offence) - nrow(d_complete), big.mark = ",", scientific = FALSE)`, `r round((nrow(d_offence) - nrow(d_complete)) / nrow(d_all) * 100, 1)`%), or both, leaving `r round(nrow(d_complete) / nrow(d_all) * 100, 1)`% or `r format(nrow(d_complete), big.mark = ",", scientific = FALSE)` complete records.
There was a distinct seasonal pattern to reporting, with significant increases in summer months.

A summary of the `r nrow(d_offence) |> scales::number(big.mark = ",")` records with an offence is presented in @tbl-offences.
Included in the table are the number and percentage of records by offence type, showing the top 6 offence types and the remainder grouped as ‘Other’.

The most common offences were ‘Driving without reasonable consideration to others (rt88576)’ and ‘Driving without due care and attention (rt88575)’.
Within the Road Traffic Act these offences are related to careless driving and drivers are subject to similar penalties.
The other common offences included failing to comply with traffic signals, traffic signs and using a handheld phone while driving.
<!-- TODO: Within the other category... -->

```{r}
#| label: tbl-offences
#| include: true
#| tbl-cap: "Offence types reported (top 6 and other)."
#| width: 60%
# Aim: get table of n. offences by mode
d_mode_offence_count = d_all |>
  count(mode, offence, sort = TRUE) 
# offences in order of n. offences
d_offence_count = d_all |>
  count(offence, sort = TRUE)
# d_offence_count |>
#   arrange(desc(n)) |>
#   head(20) |>
#   knitr::kable()
# |offence                                                                                                |    n|
# |:------------------------------------------------------------------------------------------------------|----:|
# |n/a                                                                                                    | 5706|
# |rt88576 drive without reasonable consideration to others                                               | 4992|
# |rt88575 drive without due care and attention                                                           | 2917|
# |rt88975 drive motor vehicle fail to comply with red / green arrow / lane closure traffic light signals | 1364|
# |rt88971 fail to comply with red traffic light                                                          |  679|
# |rt88966 motor vehicle fail to comply with endorsable s36 traffic sign                                  |  411|
# |rv86019 use a handheld phone / device whilst driving a motor vehicle on a road                         |  357|
# |rt88760 fail to comply with solid white lines                                                          |  265|
# |rt88751 contravene give way sign                                                                       |  264|
# |suspected contravene weight restriction.                                                               |  213|
# |rt88751 contravene mandatory direction arrows                                                          |  212|
# |me82009 driving on hard shoulder of motorway                                                           |  113|
# |rt88975 fail to comply with red traffic light                                                          |  109|
# |rt88751 motor vehicle fail to comply with a non-endorsable traffic sign other (specify)                |   91|
# |zp97004 fail to comply with red light pelican crossing                                                 |   84|
# |zp97003 stop within controlled area of pelican crossing                                                |   81|
# |rc86814 driver not in proper control of vehicle                                                        |   62|
# |zp97001 stop vehicle within limits of pelican crossing                                                 |   59|
# |hy35001 drive/ride on footpath beside a road                                                           |   42|
# |rr84171 vehicle contravene local traffic order other than parking (e.g. bus lane)                      |   41|
# Pull out the top 6 offences excluding n/a:
d_offence_top_6 = d_mode_offence_count |>
  # filter(offence != "n/a") |>
  group_by(offence) |>
  summarise(n = sum(n)) |>
  arrange(desc(n)) |>
  head(6)
d_offence_classified = d_offence_count |>
  mutate(
    Offence = case_when(
      offence %in% d_offence_top_6$offence ~ offence,
      TRUE ~ "Other"
    )
  ) |>
  group_by(Offence) |>
  summarise(`Number of records` = sum(n)) |>
  # Arrange in descending order of n except for "Other":
  arrange(Offence == "Other", desc(`Number of records`)) |>
  # Rename "n/a" to "No offence":
  mutate(Offence = case_when(
    Offence == "n/a" ~ "NA No offence or unknown offence type",
    TRUE ~ Offence
  )) |>
  mutate(`Percent of records` = round(`Number of records` / sum(`Number of records`) * 100, 1))

d_offence_totals = d_offence_classified |>
  summarise(`Number of records` = sum(`Number of records`), `Percent of records` = sum(`Percent of records`)) |>
  mutate(Offence = "Total")

# d_offence_classified |>
#   knitr::kable()

tbl = d_offence_classified |>
  bind_rows(d_offence_totals) |>
  mutate(
    `Percent of records`= round(`Percent of records`)
    ) |> 
  rename(`Number` = `Number of records`, `Percent`=`Percent of records`) |> 
  gt() |>

  gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |> 
  text_transform(
    fn <- function(x){
      code <- str_extract(x, "^[^\ ]+")
      desc <- str_remove(x, "^[^\ ]+")
      glue::glue("<em><span style='font-size:14px'>{code}</span></em><br><span style='font-size:18px'>{desc}</span>")
      },
    locations=cells_body(columns=Offence)
  ) |> 
  cols_width(1 ~ px(600), 2 ~ px(80)) |> 
  gt_theme_espn()
# gt::gtsave(tbl, "tbl-offences.html")
# webshot2::webshot("tbl-offences.html", "tbl-offences.png")
# browseURL("tbl-offences.png")
knitr::include_graphics("tbl-offences.png")
```

```{r}
# reclassify offences into types
x = d_all
regroup_offences_simple = function(x) {
 x |> 
   dplyr::mutate(
     offence_simple = dplyr::case_when(
       stringr::str_detect(offence, "drive without reasonable consideration") ~ "Inconsiderate driving",
       stringr::str_detect(offence, "drive without due care") ~ "Careless driving",
       TRUE ~ "Other"
     )
   )
}
# table(x$mode)
regroup_modes = function(x) {
  x |> 
    dplyr::mutate(
      mode_simplified = dplyr::case_when(
      mode == "cyclist" ~ "Cyclist",
      mode == "vehicle driver" ~ "Driver",
      TRUE ~ "Other"
    ))
}

d_all = regroup_offences_simple(d_all)
d_all = regroup_modes(d_all)

```

Vehicle (mostly car and van drivers with dashcams) driver and cyclist reporters dominate reporting for all records, as illustrated in @tbl-mode.
Half of the cases were reported by vehicle drivers, a third by cyclists, seven percent by pedestrians, with over two percent by horse riders and less than one percent by motorcyclists.

```{r}
#| label: tbl-mode
#| tbl-cap: Mode of transport of person submitting video evidence
#| width: "80%"
# d_all |>
#   count(mode, sort = TRUE) |>
#   mutate(percent_of_records = n / nrow(d_all)) |>
#   mutate(percent_of_records = round(percent_of_records, 3) * 100) |>
#   arrange(desc(n)) |>
#   rename_all(snakecase::to_title_case) |>
#   # Rename N to "Number of records"
#   rename(`Number of Records` = N) |>
#   knitr::kable()

tbl = d_all |>
  # NA is Unknown:
  mutate(mode = case_when(
    is.na(mode) ~ "unknown",
    TRUE ~ mode
  )) |>
  count(mode, sort = TRUE) |>
  mutate(percent_of_records = n / nrow(d_all)) |>
  mutate(percent_of_records = round(percent_of_records, 3) * 100) |>
  # Reclassify Unknown to unknown:
  arrange(desc(n))
tbl_totals = tbl |>
  summarise(n = sum(n), percent_of_records = sum(percent_of_records)) |>
  mutate(percent_of_records = round(percent_of_records)) |>
  mutate(mode = "Total")
tbl = bind_rows(tbl, tbl_totals) |>
  rename_all(snakecase::to_title_case) |>
  # Rename N to "Number of records"
  rename(`Number` = N, `Percent` = `Percent of Records`) |>
  gt() |>
  gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |> 
  cols_width(1 ~ px(400), 2 ~ px(80)) |> 
  tab_style(
    style = cell_text(size=px(18)),
    locations = cells_body(columns = Mode)
  ) |>
  gt_theme_espn()
# tbl
# gt::gtsave(tbl, "tbl-mode.html")
# webshot2::webshot("tbl-mode.html", "tbl-mode.png")
knitr::include_graphics("tbl-mode.png")
```

<!-- The equivalent table excluding records with missing offence data is shown below: -->

```{r}
#| include: false
d_all |>
  opsnap:::filter_offence_nas() |>
  filter(offence != "n/a") |>
  count(offence, sort = TRUE) |>
  mutate(percent_of_records = n / nrow(d_offence)) |>
  mutate(percent_of_records = scales::percent(round(percent_of_records, 3))) |>
  arrange(desc(n)) |>
  head(10) |>
  rename_all(snakecase::to_title_case) |>
  knitr::kable()
```

<!-- For cases submitted by people riding cycles (shown in @tbl-offences-cyclist-observer), the most common offences were also both associated with careless driving, particularly driving without reasonable consideration to others (78.7%), there were also a small proportion of cases associated with drivers using mobile phones (3.7%), failing to comply with traffic signals (3.5%) and contravening regulator signage (0.6%). -->

```{r}
# # Offence (grouped)       |               | Total |
# |-------------------------|---------------|-------|
# |                         | Dangerous driving | Other offences | 
# |-------------------------|-------------------|-----------------|
# | **Reporter transport mode** |                   |                 |
# | Vehicle driver          | **Count**           | **5511**        | **654** | **6165** |
# |                         | % within Reporter transport mode | **89.4%** | **10.6%** | **100.0%** |
# | Cyclist                 | **Count**           | **4469**        | **249** | **4718** |
# |                         | % within Reporter transport mode | **94.7%** | **5.3%**  | **100.0%** |
# | Other                   | **Count**           | **1279**        | **220** | **1499** |
# |                         | % within Reporter transport mode | **85.3%** | **14.7%** | **100.0%** |
# | **Total**               | **Count**           | **11259**       | **1123** | **12382** |
# |                         | % within Reporter transport mode | **90.9%** | **9.1%**  | **100.0%** |

# R version:
pivot_counts = d_all |>
  filter(offence != "n/a") |>
#   mutate(
#     offence_simple = case_when(
#       offence %in% d_offence_top_6$offence ~ offence,
#       TRUE ~ "Other"
#   )
# ) |>
  group_by(mode_simplified, offence_simple) |>
  summarise(n = n()) |>
  pivot_wider(names_from = mode_simplified, values_from = n) |> 
  mutate(
    Total = Cyclist + Driver + Other
  ) |>
  arrange(offence_simple == "Other", desc(Total))
pivot_counts_total = tibble::tibble(
  offence_simple = "Total",
  Cyclist = sum(pivot_counts$Cyclist),
  Driver = sum(pivot_counts$Driver),
  Other = sum(pivot_counts$Other),
  Total = sum(pivot_counts$Total)
)

pivot_percents = pivot_counts |>
  # Calculate percentage of each mode
  mutate(
    `Cyclist (%)` = Cyclist / sum(Cyclist) * 100,
    `Driver (%)` = Driver / sum(Driver) * 100,
    `Other (%)` = Other / sum(Other) * 100,
    `Total (%)` = Total / sum(Total) * 100
  )

pivot_percents_totals = pivot_percents |>
  group_by(offence_simple = "Total") |>
  summarise_if(is.numeric, sum) 

pivot_combined = bind_rows(pivot_percents, pivot_percents_totals) 

percent_careless = pivot_combined |> filter(offence_simple == "Careless driving") |> pull(`Total (%)`)
percent_inconsiderate = pivot_combined |> filter(offence_simple == "Inconsiderate driving") |> pull(`Total (%)`)
percent_careless_cycling = pivot_combined |> filter(offence_simple == "Careless driving") |> pull(`Cyclist (%)`)
percent_inconsiderate_cycling = pivot_combined |> filter(offence_simple == "Inconsiderate driving") |> pull(`Cyclist (%)`)
percent_other = pivot_combined |> filter(offence_simple == "Other") |> pull(`Total (%)`)
percent_other_cyclist = pivot_combined |> filter(offence_simple == "Other") |> pull(`Cyclist (%)`)
percent_other_driver = pivot_combined |> filter(offence_simple == "Other") |> pull(`Driver (%)`)
percent_other_other = pivot_combined |> filter(offence_simple == "Other") |> pull(`Other (%)`)

```

A cross-tabulation of transport mode (of the person submitting video evidence) and the offence type is shown in @tbl-mode-offences-crosstab.
For both variables it shows the two main categories plus an 'other' category.

While `r round(percent_inconsiderate, 1)` percent of all offences are for driving without reasonable considered to others (rt88576), they make up the bulk of offences reported by cyclists (`r round(percent_inconsiderate_cycling, 1)`%).
Drivers are proportionally `r round(percent_other_driver / percent_other_cyclist, 1)` times more likely to report other types of offences as cyclists, while other reporting modes are most likely to report other types of offences, being `r round(percent_other_other / percent_other_cyclist, 1)` times more likely to report other types of offences as cyclists.
While further research is needed to understand the reasons for these tendencies, the results match intuition.
Physically-vulnerable cyclists are understandably most concerned with the dangerous driving of vehicles, whereas drivers tend to focus on other types of road traffic offence.
Pedestrians and other reporting modes were also relatively more likely to encounter other types of offence.

```{r}
#| label: tbl-mode-offences-crosstab
#| tbl-cap: Mode of transport of person submitting video (columns).
# pivot_combined |>
#   knitr::kable(digits = 1)
tbl = pivot_combined |>
  select(-`Total (%)`) |> 
  filter(offence_simple!="Total") |> 
  select(-c(Cyclist:Other)) |> 
  pivot_longer(cols=`Cyclist (%)`:`Other (%)`, names_to="mode", values_to="perc") |> 
  mutate(
    mode=factor(str_extract(mode, "^[^\ ]+"), levels=c("Driver", "Cyclist", "Other")),
    offence_simple=factor(offence_simple, levels=c("Inconsiderate driving", "Careless driving", "Other"))
    ) |> 
  group_by(offence_simple) |> 
  summarise(perc=list(round(perc)), count=first(Total)) |> 
  gt() |>
  gt_plt_bar_stack(perc, width=65, labels = c(" Cyclists ", " Drivers ", " Other "),
                   palette= c("#e31a1c", "#1f78b4", "#bdbdbd")) |>
    cols_width(1 ~ px(180), 2 ~ px(400), 3 ~ px(150)) |> 
  tab_style(
    style = cell_text(size=px(18)),
    locations = cells_body(columns = c(offence_simple, count))
    ) |> 
    gt_theme_espn() 
# gt::gtsave(tbl, "tbl-mode-offences-crosstab.png")
knitr::include_graphics("tbl-mode-offences-crosstab.png")
```

```{r}
#| label: tbl-offences-cyclist-observer
#| tbl-cap: "Number and percentages of OpSnap records, submitted by cyclists, by offence type."
#| include: false
d_all |>
  opsnap:::filter_offence_nas() |>
  filter(offence != "n/a") |>
  filter(mode == "cyclist") |>
  count(offence, sort = TRUE) |>
  mutate(percent_of_records = n / nrow(d_offence)) |>
  mutate(percent_of_records = scales::percent(round(percent_of_records, 3))) |>
  mutate(
    offence = ifelse(n < 20, "other", offence)
  ) |>
  group_by(offence) |>
  summarise(n = sum(n), n_hybrid = n()) |>
  arrange(n_hybrid, desc(n)) |>
  select(-n_hybrid) |>
  mutate(`% of total` = scales::percent(n / sum(n), accuracy = 0.1)) |>
  rename_all(snakecase::to_title_case) |> 
  knitr::kable()

perc_no_further_action = d_all |>
  count(disposal) |>
  filter(disposal == "nfa") |>
  pull(n) / nrow(d_all) * 100

perc_course = d_all |>
  count(disposal) |>
  filter(disposal == "course") |>
  pull(n) / nrow(d_all) * 100
```

Disposal categories assigned by police are shown in @tbl-disposal.
Roughly a third of cases (`r round(perc_no_further_action, 1)`%) resulted in no further action and, for most of the remainder drivers were required to undertake an education course.
Conditional offers, that is, drivers being offered a reduced penalty for admitting guilt, were the third most common outcome.
Nearly two percent of cases went to court and a further one percent underwent further investigation.

```{r}
#| label: tbl-disposal
#| tbl-cap: "Most common disposal values in the OpSnap dataset."
# d_all |>
#   count(disposal, sort = TRUE) |>
#   mutate(percent_of_records = round(n / nrow(d_all), 3) * 100) |>
#   arrange(desc(n)) |>
#   rename_all(snakecase::to_title_case) |> 
#   knitr::kable()


tbl_disposal = d_all |>
  count(disposal, sort = TRUE) |>
  mutate(percent_of_records = round(n / nrow(d_all), 3) * 100) |>
  arrange(desc(n)) 

tbl_disposal_totals = tbl_disposal |>
  summarise(n = sum(n), percent_of_records = round(sum(percent_of_records))) |>
  mutate(disposal = "Total")

tbl = bind_rows(tbl_disposal, tbl_disposal_totals) |>
  rename_all(snakecase::to_title_case) |>
  # Rename N to "Number of records"
  rename(`Number` = N, `Percent` = `Percent of Records`) |>
  gt() |>
  gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |> 
  cols_width(1 ~ px(400), 2 ~ px(80)) |> 
  tab_style(
    style = cell_text(size=px(18)),
    locations = cells_body(columns = Disposal)
  ) |>
  gt_theme_espn()
# gt::gtsave(tbl, "tbl-disposal.html")
# webshot2::webshot("tbl-disposal.html", "tbl-disposal.png")
knitr::include_graphics("tbl-disposal.png")
```

There were `r unique(d_with_location$location) |> length()` unique locations (addresses) in the data, with the most common locations corresponding to busy roads: Meanwood Road (Leeds), Dewsbury Road (Wakefield) and Chapeltown Road (Leeds), with no single address accounting for more than 0.4% of records.
The locations were scrambled by West Yorkshire Police before being made available for data protection purposes, which meant spatial analysis would only possible for ‘all’ video submissions, ignoring different types of road users or offence types.

```{r}
#| echo: false
#| eval: false
# #| label: tbl-locations
# #| tbl-cap: "Most common locations recorded in the OpSnap dataset"
# d_with_location |>
#   count(location, sort = TRUE) |>
#   mutate(percent_of_records = round(n / nrow(d_with_location), 3) * 100) |>
#   arrange(desc(n)) |>
#   head(10) |>
#   rename_all(snakecase::to_title_case) |> 
#   knitr::kable()
# if (!file.exists("tbl-locations.png")) {
# tbl = d_with_location |>
#   count(location, sort = TRUE) |>
#   mutate(percent_of_records = round(n / nrow(d_with_location), 3) * 100) |>
#   arrange(desc(n)) |>
#   head(10) |>
#   rename_all(snakecase::to_title_case) |>
#   rename(`Number` = N, `Percent` = `Percent of Records`) |>
#   gt() |>
#   gt_plt_bar_pct(column = `Percent`, fill = "#252525", scaled = TRUE, labels=TRUE, width=120, height=30, font_size = "15px") |> 
#   cols_width(1 ~ px(400), 2 ~ px(80)) |> 
#   tab_style(
#     style = cell_text(size=px(18)),
#     locations = cells_body(columns = Location)
#   ) |>
#   gt_theme_espn()
# gt::gtsave(tbl, "tbl-locations.png")
# } else {
# knitr::include_graphics("tbl-locations.png")
# }
# knitr::kable(cars)
```

# Discussion

Drawing on the work and findings described above, this discussion outlines a research agenda.
The breadth and depth of potential policy-relevant work that could be undertaken is, we suggest, enormous.
While this preliminary list will not prove exhaustive, if it stimulates or informs further research then it will have achieved its objective.

There has been an increase in submissions to West Yorkshire Police's OpSnap system since it was set-up in 2021.
This may reflect changing levels of road use during and after the Covid-19 pandemic, increased awareness of OpSnap, or increased video camera ownership levels.
Comparative analysis including other police jurisdictions should be undertaken to determine whether this experience is common or isolated.
We hypothesise that it is a common experience for police forces (and other organisations) setting up public video submissions reporting antisocial and dangerous driving and that that the rate of submissions will continue to increase (longer term, ideally, submission rates decline as roads become safer).

This was a case study of West Yorkshire but each of the 43 police forces in England and Wales collates OpSnap data.
Our preliminary scoping of individual force websites suggests OpSnap data is publicly available nationwide.
A national dataset should be developed that includes OpSnap data from all police forces.
 A feasibility study should be undertaken to establish cross-force data availability and compatibility.
A national dataset holds potential for national-level, cross-regional, and comparative analyses of patterns and trends.
It would have the potential to promote cross-national comparative analysis and international cooperation in road safety should similar data be collated elsewhere.
It would hold the potential for the development of rankings according to different criteria and, thereby, potential performance-related metrics [@tiwana2015].
There will be added-value from a national dataset which allows identification of cross-jurisdictional issues such as the same vehicles or persons involved in incidents in different police force areas. 

With submission volumes likely to continue to increase, means of informing triage and prioritisation of cases will be increasingly valuable.
Triage systems are almost certainly used already, and research to identify best practice should be undertaken.
Similarly, research to determine best practice in determining disposal recommendations should be undertaken.
It may provide feasible, perhaps using machine learning, to develop automated means of triage, determination of offence type and disposal.
Research to gauge the differential effectiveness of disposal measures is also needed.
Within the trend of increasing video submissions there were distinct seasonal patterns.
Seasonal variation in submissions reflect seasonal (weather and other) influences, and these will vary by transport mode: cycling, for example, is less prevalent in winter.
This suggests seasonal variation in preventive responses might be tailored to need.

Future research should focus on different types of problem concentration.
It is well established that crime is highly concentrated along whatever dimension is examined [@farrellPreventingRepeatRepeat2017].
With respect to video evidence this will include recidivists (repeat offenders), repeat submitters of evidence (who may or may not be repeat victims/survivors), the concentration of incidents (close passes, near misses and crashes) at certain times and places, with different types of experiences concentrated among certain types of road users.
As discussed in the introduction, close passes are more likely to be reported by cyclists and horse riders than vehicle drivers, both due to the nature of the road user interactions and the relative risk associated with the maneuver.

There was preliminary confirmation of this pattern in our analysis which found that offence types reported by vehicle drivers are systematically different to those of other road users.
Hence future research should consider that investigative and preventive approaches should be tailored to different contexts and types of road user.
Rural roads account for a disproportionate amount of fatalities @brakeDirectLineBrake2018, and further research using video evidence may inform preventive approaches tailored to road type.
Policing and prevention efforts focused on where problems are concentrated are more resource efficient and is the foundation of problem-oriented policing [@laingAggressiveDriving2010; @scottSpeedingResidentialAreas2010].

Studies focused on specific types of road user will prove informative with respect to policy and practice.
Dash-cam submissions by motorists offers great potential to inform other aspects of road safety.
Research into submissions by pedestrians, horse riders and motorcyclists should prove viable to improve the safety of these parties.
There were over four hundred submissions by horse riders in West Yorkshire, which means there will be thousands nationwide.
The British Horse Society[^1] and others may be interested in this data being used to inform the safety of horses and riders.
The British Motorcyclists Federation[^2] may be interested in research promoting the safety of its members and in why motorcyclist submissions are relatively infrequent
.

[^1]: https://www.bhs.org.uk/

[^2]: https://www.britishmotorcyclists.co.uk/

As an example, cyclists undertake two percent of road miles in the UK [@departmentfortransport].
Other things equal, this suggests that with respect to distance travelled, cyclists were around 20 times more likely to submit evidence of a road traffic offence than vehicle drivers and other road users in West Yorkshire.

Over 90 percent of offences reported by cyclists were for driving without consideration for other road users and without due care and attention.
This is consistent with the phenomenon of 'close passes' in the cycling safety literature [@aldred2016; @cubbin2024].
It suggests the OpSnap data holds the potential for further analysis to inform knowledge about the nature of close passes generally.
For example, it should be possible to identify hotspot locations.
As with much of the further research outlined here, this the potential to inform preventive policy and practice relating to police interventions, driver behaviour, the design of roads and roadside environments.

Comparison by complainant mode of transport, reported offense type and disposal, subdivided by other characteristics, is needed.
Determining which type of submission - by which type of road user for example — is more likely to result in a recommendation of court proceedings, may help refine police investigations.
For example, are submissions by cyclists (or horse riders) more or less likely to result in court proceedings than those by motorists?
If so, why?
What is the role of substantive issues, and to what extent are decisions affected by video quality?
Further work with police investigators should inform best practice in the processing and further investigation of submissions.
How are disposal options determined?
Cross-police force comparative analysis may inform national best practice guidelines.

A constraint on the present analysis was the nature of the publicly-available data.
A pilot project should be undertaken in collaboration with police to establish the potential to further enhance policing and public safety using non-public aspects of the submissions data.
This would require working partnerships and a secure research platform to meet GDPR requirements and Data Protection Act (2018).
Such collaborative approaches are increasingly common in health, medicine, and policing research.

Video footage holds significant potential for further analysis, both qualitative and quantitative.
Police investigations might be improved by research to identify and promote good practice in the assessment of footage, its use in determining disposals, further investigations, and prosecutions.
What type of footage works best in the courtroom, and how is it best identified and prepared?
What is the potential for machine learning to identify, clean, and prepare footage of the most serious submissions?
There are also likely to be lessons that can be learned for how footage is gathered, edited and submitted by road users.

Analysis of video footage should be undertaken to identify risk factors, that is, the types of situations in which crashes, near misses, close passes and other offences occur.
Such research can inform policy and practice in ways that ameliorate risk.
Within this area of research, analysis from dash cams is likely to inform different practices than that from cycle-cams, that from pedestrians, horse riders and motorcyclists, and so on. 

Some research into the relationship between video footage and supporting written evidence is needed.
Is written supported evidence always needed?
Which is deemed most important by police, and which by courts?
What are the characteristics of strong supporting written evidence, and what are the characteristics of strong video evidence?
Do both need to be ‘strong’ or can a weakness in one be overcome by particularly strong aspects in the other?
Research should identify further aspects of good practice for those submitting evidence to police.

The spatial analysis undertaken here was obliged to use data for 'all' video submissions.
We were unable to cross-tabulate the geographic location of submissions by different road users because, in the publicly-available data, locations are not be matched to individual cases: they were scrambled to ensure anonymity for GDPR purposes.
An obvious next step, in the context of a secure research platform, would be spatial analysis for different types of road users, for different types of incidents, for incidents resulting in different disposals, and so on.
Hotspots and spatial clustering are likely to vary by type of road user, type of incident, day or week, time of day, and so on.

Some research into the relationship between video footage and supporting written evidence is needed.
Is written supported evidence always needed?
Which is deemed most important by police, and which by courts?
What are the characteristics of strong supporting written evidence, and what are the characteristics of strong video evidence?
Do both need to be ‘strong’ or can a weakness in one be overcome by particularly strong aspects in the other?
Research should identify further aspects of good practice for those submitting evidence to police. 
 

We did not include spatial analysis here, but the data hold that potential.
The public data ued here only allowed spatial analysis for ‘all’ video submissions.
That is, we were unable to distinguish the geographic location of submissions by different types of road users because, or for different offence types, because the locations were scrambled for data protection purposes.
Future research, in the context of a secure data platform, should include spatial analysis for different types of road users, for different types of incidents, for incidents resulting in different disposals.
Hotspots and spatial clustering will vary by type of road user, type of incident, day or week, time of day, and so on, and parsing the data will produce more informed road safety research. 

Accessing OpSnap data in collaboration with police is likely to prove the most fruitful approach for future research.
Were that impractical, research using existing online videos should be undertaken.
There are many thousands of such videos on social media in the public realm, including those posted by police forces, and while there are different sampling issues to consider, this sources offers a plausible alternative route to informing road safety.

OpSnap data should be compared to, and integrated with, other road use and road safety datasets.
Here we offered preliminary comparison to the volume of Stats19 data on road crashes.
patial analysis integrated with Strava data (road use data by cyclists and pedestrians), Google road use and other data, may facilitate exposure-based measures rather than the counts used here.
The use of rates will enhance the identification of locations with higher risk other than that due to volume of traffic.
  Future research should recognise that OpSnapdata holds the potential for use in the evaluation of experimental interventions.
It may offer the potential for pre-post intervention comparative evaluations using control sites.
Different road safety interventions imply different resource needs.
Ideally, evaluations would include cost-benefit analysis of the portfolio of social and economic costs involved, including the cost of death and injury.

# Conclusion

This study offers what we believe is the first study using the open access Operation Snap data.
It provides proof-of-concept that OpSnap data holds considerable research potential.
OpSnap data has, we suggest, significant potential to inform policy and practice promoting road safety.
Drawing on the preliminary analysis of the West Yorkshire data, a  research agenda was outlined.
Our preliminary list will prove far from exhaustive.
And while aspects of that agenda may appear ambitious at the time of writing, we suggest that the present study will have achieved one of its aims if this agenda is rapidly superseded.

# Acknowledgements

[To be added after peer review]

<!--

We would like to thank Ian King for making the connections that led to this paper being written.
Thanks to Roger Beecham who wrote code for visualising some of the tables/figures in the paper, testing and supporting with reproducibility. 
Thanks to Simon D'vali and colleagues from City of Bradford Metropolitan Council for their support and advice on the project and colleagues from the Safer Cycling Yorkshire group.

-->

# References