Skip to content
Nicky Nicolson edited this page Oct 22, 2015 · 10 revisions

This (work-in-progress) page defines terms used elsewhere in the project documentation.

#Glossary

Term Definition
Configuration A Spring XML configuration which defines the datasource to be exposed for matching, and how that data is transformed and matched. Also allows definition of how matches should be reported.
Datasource A tabular datasource, read from either a JDBC accessible database or a tabular delimited datafile
Authority datasource A tabular datasource, which is transformed and stored.
Query datasource A tabular datasource, which is transformed and matched against the authority datasource.
Property A field in a datasource
Transformer A JavaBean which implements the transform() method to read in a string value from a property and return a transformed value. Managed in a separate github project: String-Transformers. Multiple transformers can be applied to a single property.
Matcher A JavaBean which implements the match() method - accepts two (transformed) values and returns a Boolean flag to indicate if they match or not. Only one matcher can be applied to a property (but a composite matcher allows the definition of a set of matchers where all or at least one of the component matchers must return a true value for the composite matcher to return true)
Reporter A reporter writes a text file report containing details of the matches found.
Dictionary A file listing pairs of terms that are considered equivalents. Some transformers read in a dictionary data file.
Record linkage This is the computer science term for matching data records based on the contents of their constituent fields (due to lack of shared identifiers).

#Example

ID Name DateOfBirth Address
1 John Smith 1970-01-01
2 J.Smith 01.01.1970
3 John H. Smith 01.Jan.1970

In this example, the whole table is the datasource. The properties are Name, DateOfBirth and Address. We may specify transformers to operate on the property values (to overcome known problems) - e.g. here we may apply a SurnameExtractorTransformer on Name, and a YearExtractorTransformer on DateOfBirth. As all data values associated with properties are run through a defined transformation step, we can then specify matchers which define if two values are considered a match or not. We could define an IntegerMatcher on the year extracted from DateOfBirth, with a tolerance of +/- 1.

Clone this wiki locally