Skip to content

PPD data model

Peter Inglesby edited this page Jul 25, 2018 · 9 revisions

MDR, dm+d, and CDR

The MDR is the legacy database on which dispenser payments is based. It most notably is mapped to BNF code only, which makes it hard to make use of the rich metadata available in the publicly-available dm+d NHS database. It also has a restriction of 40 characters on the BNF Name field.

dm+d is itself generated from CDR, which is the next-generation payments database that's been in development for around a decade.

CDR notes

BNF Code mappings

We can map from PPD data (which is supplied in MDR format) to dm+d via a mapping which is updated by NHSBSA every quarter. As dm+d is updated monthly, this means it's possible for some actively prescribed dm+d products to have no corresponding BNF Code mapped; however, given the lag of 8+ weeks between a dispensing event and the release of the monthly PPD data, this is unlikely ever to be a problem.

MDR field notes

What is quantity?

What is actual_cost?

  • Doesn't reflect broken bulk, or other costs based on prescribing which cannot be directly attributed to practices. These costs include VAT, nationally unidentified prescribing, broken bulk, schedule adjustments, net cross boundary costs and out of pocket expenses and payments for containers for national unidentified prescribing (source)

dm+d notes

dm+d is the NHS Dictionary of Medicines & Devices and is documented here.

There is a dm+d browser here but its search is pretty slow, and it'd be nice to build our own.

Importing data

Data is published weekly by NHS Digital from TRUD, but we currently download it as part of the monthly import. There is an issue about changing to a weekly download. The code for the task that fetches the data is here, and the code for the task that imports the data is here.

VMPs and AMPs and VMPPs and AMPPs, oh my!

VMPs, AMPs, VMPPs, and AMPPs are the main objects in the dm+d data model.

This is quoted directly from the documentation above:

  • The Virtual Medicinal Product (VMP) describes the generic title for a product including the form and strength, for example Atenolol 100mg tablets.

  • The Virtual Medicinal Product Pack (VMPP) describes the generic title for a generic or proprietary product pack which is known to have been available. The description includes the pack size, for example Atenolol 100mg tablets 28 tablet.

  • The Actual Medicinal Product (AMP) describes an actual product which is known to have been available linked to the name of a particular supplier, for example Tenormin 100mg tablets (AstraZeneca UK Ltd).

  • The Actual Medicinal Product Pack (AMPP) describes an actual product which is known to have been available linked to both the name of a particular supplier and information on the pack size of the product, for example Tenormin 100mg tablets (AstraZeneca UK Ltd) 28 tablet 2 x 14 tablets.

There are foreign key relationships between these four types of object. (These relationships are implicit in our current implementation, but are observed to be enforced in the source.)

The relationships are as follows:

  • A VMP has many AMPs / an AMP belongs to a VMP
  • A VMP has many VMPPs / a VMPP belongs to a VMP
  • An AMP has many AMPPs / an AMPP belongs to an AMP
  • A VMPP has many AMPPs / an AMPP belongs to a VMPP

You can think of the relationship between the four types of object as forming a diamond.

VMP<------+ AMP
 ^           ^
 |           |
 |           |
 |           |
 +           +
VMPP<-----+ AMPP

We store these four types of object in four tables, with the following (simplified) schemas:

VMP (table dmd_vmp)

field description
vpid Primary key
nm Name

VMPP (table dmd_vmpp)

field description
vppid Primary key
vpid Foreign key to VMP
nm Name

AMP (table dmd_amp)

field description
apid Primary key
vpid Foreign key to VMP
nm Name

AMPP (table dmd_ampp)

field description
appid Primary key
apid Foreign key to AMP
vppid Foreign key to VMPP
nm Name

Invariants

There are no dangling references between objects in the diamond

That is, an AMP's VMP exists, a VMPP's VMP exists, and an AMPP's VMPP and AMP both exist.

Eg: SELECT COUNT(*) FROM dmd_amp LEFT OUTER JOIN dmd_vmp ON dmd_amp.vpid = dmd_vmp.vpid WHERE dmd_vmp.vpid IS NULL.

The diamond is consistent

That is, whichever way you traverse the diamond from AMPP to VMP, you get the same result. In other words, an AMPP's AMP's VMP is the same as that AMPP's VMPP's VMP.

SELECT COUNT(dmd_ampp.*)
FROM dmd_ampp
INNER JOIN dmd_amp ON dmd_ampp.apid = dmd_amp.apid
INNER JOIN dmd_vmpp ON dmd_ampp.vppid = dmd_vmpp.vppid
WHERE dmd_amp.vpid != dmd_vmpp.vpid;

Every diamond is complete

That is, every VMP has at least one VMPP, AMP, and AMPP.

WITH
  vmpp_counts AS (
    SELECT
      dmd_vmp.vpid,
      COUNT(dmd_vmpp.*) AS vmpp_count
    FROM dmd_vmp
    INNER JOIN dmd_vmpp ON dmd_vmp.vpid = dmd_vmpp.vpid
    GROUP BY dmd_vmp.vpid
  ),

  amp_counts AS (
    SELECT
      dmd_vmp.vpid,
      COUNT(dmd_amp.*) AS amp_count
    FROM dmd_vmp
    INNER JOIN dmd_amp ON dmd_vmp.vpid = dmd_amp.vpid
    GROUP BY dmd_vmp.vpid
  ),

  ampp_counts AS (
    SELECT
      dmd_vmp.vpid,
      COUNT(dmd_ampp.*) AS ampp_count
    FROM dmd_vmp
    INNER JOIN dmd_vmpp ON dmd_vmp.vpid = dmd_vmpp.vpid
    INNER JOIN dmd_ampp ON dmd_vmpp.vppid = dmd_ampp.vppid
    GROUP BY dmd_vmp.vpid
  )

SELECT
  vmpp_counts.vpid,
  COALESCE(vmpp_count, 0) AS vmpp_count,
  COALESCE(amp_count, 0) AS amp_count,
  COALESCE(ampp_count, 0) AS ampp_count
FROM vmpp_counts
FULL OUTER JOIN amp_counts ON vmpp_counts.vpid = amp_counts.vpid
FULL OUTER JOIN ampp_counts ON vmpp_counts.vpid = ampp_counts.vpid
WHERE vmpp_count = 0 OR amp_count = 0 OR ampp_count = 0

Thinking about the objects belonging to a VMP

I have found it helpful to think of the structure of the data by thinking just about the objects belonging to a single VMP. These objects can be arranged in a 2-dimensional table, with VMPPs as the column headings, AMPs as the row headings, and AMPPs in the table cells.

For example, here's the table for VMP Diclofenac 2.32% gel:

Diclofenac 2.32% gel 100 gram Diclofenac 2.32% gel 30 gram Diclofenac 2.32% gel 50 gram
Diclofenac 2.32% gel (DE Pharmaceuticals) Diclofenac 2.32% gel (DE Pharmaceuticals) 30 gram Diclofenac 2.32% gel (DE Pharmaceuticals) 50 gram
Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) 30 gram Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) 50 gram
Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 100 gram Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 30 gram Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 50 gram

We see that there are 3 VMPPs (for 100g, 30g, and 50g pack sizes), and 3 AMPs (for 3 different manufacturers). There are 7 AMPPs: each manufacturer produces 30g and 50g packs, but only GSK produce a 100g pack.