-
Notifications
You must be signed in to change notification settings - Fork 3
Glossary
Nicky Nicolson edited this page Oct 22, 2015
·
10 revisions
This (work-in-progress) page defines terms used elsewhere in the project documentation.
#Glossary
Term | Definition |
---|---|
Configuration | A Spring XML configuration which defines the datasource to be exposed for matching, and how that data is transformed and matched. Also allows definition of how matches should be reported. |
Datasource | A tabular datasource, read from either a JDBC accessible database or a tabular delimited datafile |
Authority datasource | A tabular datasource, which is transformed and stored. |
Query datasource | A tabular datasource, which is transformed and matched against the authority datasource. |
Property | A field in a datasource |
Transformer | A JavaBean which implements the transform() method to read in a string value from a property and return a transformed value. Managed in a separate github project: String-Transformers. Multiple transformers can be applied to a single property. |
Matcher | A JavaBean which implements the match() method - accepts two (transformed) values and returns a Boolean flag to indicate if they match or not. Only one matcher can be applied to a property (but a composite matcher allows the definition of a set of matchers where all or at least one of the component matchers must return a true value for the composite matcher to return true) |
Reporter | A reporter writes a text file report containing details of the matches found. |
Dictionary | A file listing pairs of terms that are considered equivalents. Some transformers read in a dictionary data file. |
Record linkage | This is the computer science term for matching data records based on the contents of their constituent fields (due to lack of shared identifiers). |
#Example
ID | Name | DateOfBirth | Address |
---|---|---|---|
1 | John Smith | 1970-01-01 | |
2 | J.Smith | 01.01.1970 | |
3 | John H. Smith | 01.Jan.1970 |
In this example, the whole table is the datasource. The properties are Name, DateOfBirth and Address. We may specify transformers to operate on the property values (to overcome known problems) - e.g. here we may apply a SurnameExtractorTransformer on Name, and a YearExtractorTransformer on DateOfBirth. As all data values associated with properties are run through a defined transformation step, we can then specify matchers which define if two values are considered a match or not. We could define an IntegerMatcher on the year extracted from DateOfBirth, with a tolerance of +/- 1.