Skip to content
This repository has been archived by the owner on Oct 19, 2023. It is now read-only.

2.8 Modelling approaches

Matt Woodburn edited this page Jun 9, 2022 · 13 revisions

ObjectGroups and relationships

In many ways, the terms that are used to characterise the collection objects within an ObjectGroup are similar or identical to those that are used to describe an individual collection object. However, there is a fundamental difference in the relationship with those properties between the two examples. For an ObjectGroup, representing multiple objects, there is always the potential for there to be more than one value for any of those terms.

For example, a single object will only have one preservation method (e.g. ‘dried and pinned’), whereas an ObjectGroup can represent objects with a variety of preservation methods. The former represents a one-to-many relationship between the term and the object (an object may not have more than one preservation method, but one preservation method may relate to many objects). The latter has a many-to-many relationship between the term and the ObjectGroup (an ObjectGroup may reflect more than one preservation method, and one preservation method may relate to many ObjectGroups).

Why do relationships matter?

The main impact is on the metrics (represented by the MeasurementOrFact class) that can be attached to an ObjectGroup, and how they can be used. If an ObjectGroup has more than one value for the same term, then it’s not possible to tell how metrics attached to the ObjectGroup are distributed across those values.

For example, if an ObjectGroup has a single preservation method of ‘dried and pinned’, and an ‘object quantity’ (represented by a MeasurementOrFact record) of 10,000, we know that there are 10,000 dried and pinned objects. If however that ObjectGroup has two preservation methods, ‘dried and pinned’ and ‘alcohol’, we know that there are 10,000 objects that are either dried and pinned OR preserved in alcohol, but we cannot calculate how many there are of each.

The only way to get an accurate assessment of the overall object quantity AND the object quantity for each preservation method is to split the ObjectGroup into two ObjectGroups: one containing just the ‘dried and pinned’ objects, and one for the ‘alcohol’ objects. This means that the preservation method maintains a one-to-many relationship with the ObjectGroups, rather than a many-to-many relationship. This is the key to being able to accurately aggregate and report metrics against the preservation method property, as well as the ObjectGroups.

‘Dimensions’ and ‘associations’

We’ve established that:

  • Terms can either have a one-to-many or a many-to-many relationship with the ObjectGroup
  • One-to-many relationships between the ObjectGroup and a term are required to be able to report accurate metrics against that property

Within the Latimer Core model concept, a term where a one-to-many relationship with the ObjectGroup has been enforced is referred to as a dimension. These are the terms that are effectively used to determine how a collection needs to be broken down into multiple ObjectGroups, in order to satisfy the requirements for numeric reporting. It’s important to note that the term or terms to be designated as dimensions will vary between implementations, depending on the use case and requirements, and so are not prescribed as part of the Latimer Core model. They can, however, be defined for an implementation using the CollectionDescriptionScheme structure, as described earlier in the document.

Terms that can have a many-to-many relationship with the ObjectGroup are referred to, for the purposes of this guidance documentation, as associations. They provide information about what is in the ObjectGroup, but cannot generally be quantified using the metrics attached to the ObjectGroup (but see the metric options discussed in section X).

When should dimensions and associations be used?

There are both benefits and limitations to applying terms as either dimensions or associations.

Associations

Associations essentially allow you to use the properties like ObjectGroup tags, attaching a number of values for the same property to a single ObjectGroup (Figure 1).

ObjectGroupWithTwoTerms

Figure 1: An ObjectGroup with two terms (Taxon and GeographicOrigin) attached as associations.

Associations enable you to:

  • reflect the scope of your collections using a range of properties and their associated values
  • keep the data structure relatively simple and maintainable, with a small number of ObjectGroups
  • link between ObjectGroups using common properties and vocabularies

However they will only give you basic figures for your collections, restricted to the absolute numbers in the ObjectGroups, and not reflecting the associated properties.

Dimensions

Using dimensions creates a consistent structure to the breakdown of the collection according to one or more common properties. Metrics can be used to accurately describe, analyse and visualise the collections in numeric terms.

As a dimension can only have one value for any given ObjectGroup, this effectively means the collections being described must be split into as many ObjectGroups as there are values for that dimension. If more than one property is designated as a dimension, then there must be an ObjectGroup for every valid combination of values in those dimensions.

Figure 2 shows an example where two terms - Taxon and GeographicOrigin - have been designated as dimensions. Each dimension has two possible values, so the collection must be split into four ObjectGroups, one for each combination of those two dimensions.

CollectionWithTwoDimensions

Figure 2: Breaking down a collection using two dimensions.

Metrics (represented by the MeasurementOrFact class) are attached to each ObjectGroup, in this example ‘object quantity’. This means that it’s possible to calculate the number of objects within a dimension (e.g. 175 objects from Tanzania or 375 reptiles), across the two dimensions (e.g. 200 Ethiopian mammals, 100 Tanzanian reptiles), and for the collection as a whole by aggregating numbers from all ObjectGroups.

In the real world however a collection is likely to have many more taxa, and many more geographic origins, and so the number of ObjectGroups in the grid would probably be considerably larger. Also, if a third dimension is introduced then the number of ObjectGroups is also multiplied by the number of values in that new dimension. So there are some practical limitations on the number of dimensions that can be used (and the number of values in each dimension), related to the manageability of the data but also primarily the amount of time and effort required to estimate collection metrics at such a detailed and granular level.

Within these constraints, using dimensions is most effective in scenarios for showing collections demographics or inventories, where:

  • a structured breakdown of the collection is needed using a small number of properties
  • there is a need for dynamic, quantitative reporting across those properties
  • there is sufficient resource available to estimate or calculate metrics for a larger number of ObjectGroups, and maintain a more complex dataset

Combining dimensions and associations

In practice, it is likely that most use cases of the Latimer Core model would be best suited by a combination of terms as dimensions, and terms as associations. For example, an institution might describe its collection by applying an organisational hierarchy as a dimension, so that there is one ObjectGroup for each OrganisationalUnit defined by the institution, with associated metrics. To each of those ObjectGroups, there might then be a number of other terms attached as associations to further describe and reflect the scope (e.g. taxonomic or geographic) of the objects within the group.

Model options for metrics

There are two main options for how metrics can be used within the Latimer Core data model, each with strengths and weaknesses.

Option 1: Metrics attached to the ObjectGroup

The first option (Figure 3) is to attach the metric directly to the ObjectGroup. This means that you can get an accurate figure for the total number of objects within the ObjectGroup, but not for the number of objects according to each of the terms (unless the terms are designated as dimensions, and the ObjectGroup split into multiple ObjectGroups, as described in the previous section). This option tends to be better suited to the dimensional model, for providing precise statistics.

metric_option1

Figure 3: Attaching metrics directly to the ObjectGroup class.

Option 2: Metrics attached to the relationship between the ObjectGroup and each property

The second option (Figure 4) is to attach metrics to the relationships between the ObjectGroup and the terms (designated as associations). For example, an ObjectGroup has 100 objects from Tanzania, 150 objects from Ethiopia, 75 reptiles and 125 mammals. This means that you have accurate figures relating to each property, but do not know how many objects there are overall - there is no denominator. In practice, this can be achieved by embedding instances of the MeasurementOrFact in a ResourceRelationship used to create the relationship between the ObjectGroup and the term.

metric_option2

Figure 4: Attaching metrics to links between classes, in order to quantify or qualify the relationship.

This option is better for a less structured, more graph-like approach to modelling the collections. It does avoid the need to break down into a greater number of more granular ObjectGroups, as per the dimensional model used in Option 1, but has limitations in providing accurate quantitative data.

As with the two options of using properties as associations or dimensions, there is potential to combine these two approaches to suit the use case.

Key points

  • Most terms related to an ObjectGroup can be used as either an association or a dimension.
  • Associations require less effort and are good for descriptive purposes, but bad for quantitative reporting.
  • Dimensions are good for quantitative reporting, but require more effort, and have limitations in how many can be applied.
  • Metrics can potentially be attached either directly to the ObjectGroup, or to the relationships between the ObjectGroup and its terms.