Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2 - Parameterized #178

Closed
ArthurChapman opened this issue May 13, 2019 · 35 comments
Closed

TG2 - Parameterized #178

ArthurChapman opened this issue May 13, 2019 · 35 comments
Labels

Comments

@ArthurChapman
Copy link
Collaborator

Having a look at the tests, we now seem to have added Parameterized to (virtually) every test where we have a vocabulary - even where (e.g. #62) the Vocabulary is an ISO Standard.

I am not sure that we have thought this through for each case.

  1. What are the parameters that need to be set (sometimes it appears to be the "specified source authority", in others an upper or lower limit - date, elevation, etc.)
  2. I think we need a default in most cases if the Parameter is not set. We have put that in some. i.e. a default vocabulary (e.g. TGN) or value (1 Jan 1753), etc.
  3. There is a lot of extra work when running the tests if one has to set a parameter for lots and lots (42) of tests.
    I think we need to make it clear (in Notes?) of what the Parameter is that needs to be set. It is not clear in some of the tests where we have Parameterized. In most it is specifying the Source Authority, in others an upper or lower limit.
    Are we over using parameterization? Do we need another field explaining what the parameter is?
@tucotuco
Copy link
Member

I have to admit that the thought of a field for parameters occurred to me in passing as well. I think it would help make things clear. The field could contain a good descriptive name for the parameter(s) and the default value(s).
Default values for vocabularies may be tough in some cases, as they do not exist, or are not vetted community wide, or they are not apt for inclusion as they are (e.g., TGN). What do we do in those cases?
I definitely do not think we are over-using parametrization. I think it is super important to make the tests flexible.
Note that test suites will have to include parameter values as well.

@ArthurChapman ArthurChapman added Parameterized Test requires a parameter question and removed Parameterized Test requires a parameter question labels May 13, 2019
@Tasilee
Copy link
Collaborator

Tasilee commented May 15, 2019

OK. This is what I was trying to get at with my comment on #63 about the correlation of vocabs and parameterized.

@Tasilee
Copy link
Collaborator

Tasilee commented May 16, 2019

OK, in checking the first few of Parameterized with the longsuffering @ArthurChapman, there are syntax and content issues we need to standardize before I feel comfortable about making more changes (42 all up at the moment). So far Parameter(s) edited in table as examples-

#163 Specified source authority, default = http://rs.gbif.org/vocabulary/gbif/rank.xml
#162 Specified source authority, default = http://rs.gbif.org/vocabulary/gbif/rank.xml
#141 earliest_year, default = 1700; latest_year, default = current year
#139 Specified source authority, default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html)

This raises

  1. Are we happy with "Specified source authority" and syntax for terms such as "XXX_YY"?
  2. Do we really need to use "optional"? I would think not as if needed, covered in Notes?
  3. Do we have a label as in "The Getty Thesaurus of Geographic Names" or "TGN" and/or a URL that provides more information if not the API address??
  4. We note re-use of References in Parameter(s), which may be fine?

--

@ArthurChapman
Copy link
Collaborator Author

There has been some discussion around default values for parameterized tests

  1. I would like to see a default in all cases where possible (even if these may change down the line as Vocabularies of Value are built.
  2. There has been discussion on default value for year (of eventdate, year, etc.) as apposed to taxonomic values which should be Linnean 1953. In some cases we are using (suggesting) 1700 as a lower limit of collecting dates for biological specimens (although @tucotuco has suggested that there be no default value). I think 1700 is too late for a default value as there are hundreds of thousands, if not millions of collections that predate 1700 - especially in Europe, Asia, etc. But what date should we set? 1650, 1600? Suggestions please - we need to finalize these

@ArthurChapman
Copy link
Collaborator Author

To answer @Tasilee should we use a link to a web address, a name ("The Getty Thesaurus of Geographic Names" or "TGN") or a link to an API? In the parameter field, I think it should be an API if possible for the default. The in the References, a full name and web link to the vocabulary.

  1. I am happy with "Specified source authority" . Not sure what you mean by "XXX_YY" - but if you mean NOT_FOUND, NO_REPORT, etc. - I am happy with these. Perhaps if there is confusion, we should add them to Vocabulary.
  2. Need to see an example - we do say something like does not extend beyond optionally provided begin and end dates. In this case - I don't think it is necessary - but make sure there is a default - if they want to set it then it is an option, if not it is the default
  3. Should have an API, The full name and web address should be in the references.
  4. Not sure what you mean here.

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 5, 2019

Thanks @ArthurChapman. I agree that we should supply a default even if it is 'best guess' as that will be helpful for implementers as a starting position.

Regarding default minimum year , I think you mean '1753' and not '1953'? With my limited taxonomic experience, '1600' would seem a reasonable 'flag-raising' point but my reservation is that I tend to err toward false positives rather than false negatives. Meaning, I would rather raise a flag for those below 1753 than to not flag those between 1600 and 1753.

The 'XXX-YYY' was to cover terms in the 'Expected response' such as 'NOT_EMPTY', 'NOT_COMPLIANT', 'NO_REPORT' etc. I will check that these are in the vocab, as they have grown with the implementation of the 'Expected responses'.

I agree that the Parameter defaults should ideally point to an API, but a) some don't exist, b) some exist but may not be tightly coupled to a 'standard' and c) some are hard to find.

My note about References in Parameters means that in some cases, we use the references as a link to defaults. In other cases, I have taken info from the 'Expected response', for example if there is a mention of 'authority'.

@ArthurChapman
Copy link
Collaborator Author

Yes 1753. There is no logical reason for selecting 1753 for collections - there is for taxonomy. I am not sure where we got 1700 and what the logic was for that. 1600 predates the years of major scientific exploration (Spanish, Portuguese, British and French).

@chicoreus
Copy link
Collaborator

Tests should only be parameterized when we have identified user stories in the areas that TG3 examined that clearly have different parts of the community wishing to use different parameters. The two only valid cases that come to my mind right off are application of a particular national taxonomic authority for tests involving scientific names and specifications of the earliest valid date for identifications or eventDates, where particular data sets are known by their users to have earliest valid dates.

Parameters must not point to hypothetical resources that are not available to implementors.

@chicoreus
Copy link
Collaborator

@ArthurChapman, yes, if we specify that a test is parameterized, we must specify a default value.

I suspect that the identifiiers (guids) for tests should only apply to implementations of those tests that use the default parameter values, and that implemenations which take other values should use different guids to allow for machine comparison of results, but as the intent of parameters is to change the test behavior at runtime that might significantly complicate implementation. One alternative (thinking in terms of annotated java methods ala the filtered push implementations), would be to have one identifier refer to a test with the default parameter, and another identifier refer to the same test, but with any other value for the parameter (java implementation on the order of

@Provides("baf2a90b-af45-4f1a-839f-47126743a48a")
public DQResponse<AmendmentValue> amendmentYearStandardized(
                       @ActedUpon("dwc:year") String year) 
{
    Integer minimumYear = 1753;
    return amendmentYearStandardized(year, minimumYear);
}
@Provides("ab37fd2a-fe95-4ab6-8a0c-e40ea3f97bb4")
public DQResponse<AmendmentValue> amendmentYearStandardized(
                     @ActedUpon("dwc:year") String year. Integer minimumYear) 
{
     // actual test implementation
}

), where the first method uses the guid currently specified for the test, and the second method uses a guid that we would need to specify for parameterized implementations.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 16, 2019

@ArthurChapman and I have been discussing 'Needs work' tagged tests and resolved a few, but there are three remaining. Also, a question to the rest of you about the Expected Response regarding specified source authority. Should we

  1. leave the phrase specified source authority or should we use
  2. bdq:sourceAuthority ?

@chicoreus
Copy link
Collaborator

@Tasilee the updates to make the parameter values structured and consistent is great.

@chicoreus
Copy link
Collaborator

Significant remaining problem: A very large number of the tests which take parameters should not be parameterized. I've noted this on #20, only tests for which we have use cases where different user communities will expect the tests to behave in different ways should be parameterized (such as a country wishing to validate scientific names against a national list rather than a global one). We must not specify parameters that point implementors to a resource from which the controlled vocabulary for a particular test can be found, that is something for the notes. When the specification says, e.g. compliant if matching ISO vocabulary x, then the implementor must use that vocabulary, and where they get it an how they get it is an implementation detail, not a parameter.

All of the tests that have parameters need careful review to see if there is a clear use case for different users to expect different behaviors of the test for different uses, not whether or not there are multiple possible sources that could be used for some vocabulary.

@chicoreus
Copy link
Collaborator

chicoreus commented Aug 20, 2019

We have 41 tests that specify parameters. It looks to me like only 18 of those are actually candidates for parameterization, and each of these needs careful consideration and identification of the use cases that require the test to be parameterized.

No. Name Parameter
84 VALIDATION_YEAR_OUTOFRANGE bdq:earliestDate = 1600, bdq:latestDate = current year
107 VALIDATION_MINDEPTH-MAXDEPTH_OUTOFRANGE bdq:minimumValidDepthInMeters = 0, bdq:maximumValidDepthInMeters = 11000
112 VALIDATION_MAXELEVATION_OUTOFRANGE bdq:minimumValidElevationInMeters = -423, bdq:maximumValidEvelavtionInMeters = 8850
122 VALIDATION_GENUS_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
123 VALIDATION_CLASSIFICATION_AMBIGUOUS bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
22 VALIDATION_PHYLUM_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
28 VALIDATION_FAMILY_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
45 AMENDMENT_POLYNOMIAL_STANDARDIZED bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
46 VALIDATION_POLYNOMIAL_NOTSTANDARD bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
57 AMENDMENT_TAXONID_FROM_TAXON bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
70 VALIDATION_TAXON_AMBIGUOUS bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
71 AMENDMENT_SCIENTIFICNAME_FROM_TAXONID bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
77 VALIDATION_CLASS_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
81 VALIDATION_KINGDOM_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
83 VALIDATION_ORDER_NOTFOUND bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species)
76 VALIDATION_DATEIDENTIFIED_OUTOFRANGE Default values: bdq:earliestDate = 1753-01-01, bdq:latestDate = current day
36 VALIDATION_EVENTDATE_OUTOFRANGE Default values: bdq:earliestValidDate = 1600, bdq:latestValidDate = current year
39 VALIDATION_MINELEVATION_OUTOFRANGE Default values: bdq:minimumValidElevationInMeters = -428, bdq:maximumValidElevationInMeters = 8850
102 AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT (but not: bdq:sourceAuthority (default = http://epsg.io/))

@chicoreus
Copy link
Collaborator

chicoreus commented Aug 20, 2019

The following tests have parameters and look to me like they very unambiguously must not be parameterized. The resources mentioned should be moved either into the specification or the notes, and not specified as a parameter.

No. Name Parameter
106 AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON bdq:sourceAuthority (default = (https://dwc.tdwg.org/terms/#identificationQualifier)
59 VALIDATION_GEODETICDATUM_NOTSTANDARD bdq:sourceAuthority (default = http://epsg.io/)
60 AMENDMENT_GEODETICDATUM_STANDARDIZED bdq:sourceAuthority (default = http://epsg.io/)
51 VALIDATION_COORDINATES_TERRESTRIALMARINE bdq:sourceAuthority (default = http://irmng.org)
162 VALIDATION_TAXONRANK_NOTSTANDARD bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml)
163 AMENDMENT_TAXONRANK_STANDARDIZED bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml)
104 VALIDATION_BASISOFRECORD_NOTSTANDARD bdq:sourceAuthority (default = http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)
63 AMENDMENT_BASISOFRECORD_STANDARDIZED bdq:sourceAuthority (default = http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)
133 AMENDMENT_LICENSE_STANDARDIZED bdq:sourceAuthority (default = https://creativecommons.org/)
38 VALIDATION_LICENSE_NOTSTANDARD bdq:sourceAuthority (default = https://creativecommons.org/)
97 VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#identificationQualifier)
115 AMENDMENT_OCCURRENCESTATUS_STANDARDIZED bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#occurrenceStatus)
116 VALIDATION_OCCURRENCESTATUS_NOTSTANDARD bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#occurrenceStatus)
20 VALIDATION_COUNTRYCODE_NOTSTANDARD bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes)
48 AMENDMENT_COUNTRYCODE_STANDARDIZED bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes)
62 VALIDATION_COUNTRY_COUNTRYCODE_INCONSISTENT bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes)
73 AMENDMENT_COUNTRYCODE_FROM_COORDINATES bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes)
50 VALIDATION_COORDINATES_COUNTRYCODE_INCONSISTENT bdq:sourceAuthority (default = https://www.iso.org/obp/ui)
118 AMENDMENT_GEOGRAPHY_STANDARDIZED bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html))
139 VALIDATION_GEOGRAPHY_NOTSTANDARD bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html))
21 VALIDATION_COUNTRY_NOTSTANDARD bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html))
95 VALIDATION_GEOGRAPHY_AMBIGUOUS bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html))

@ArthurChapman
Copy link
Collaborator Author

@chicoreus I will look at this in detail when I get back home (away at the moment), but the Geodetic Datum (#102, #59, #60) ones should be Paramaterized as different jurisdictions use different defaults (some by legislation - eg. Brazil) and WGS84 may not always be the best default. In Brazil, for example, if no datum is specified, you can be nearly certain that the default is either SAD69(96) or SIRGAS2000 (depending on the date). Also many jurisdictions are using Coordinate Reference Systems (CRS) rather then datums as these are more often than not what is being given on GPS units. I will check their wording later. Like you, I think we have unnecessarily made too many tests Paramaterized. @tucotuco may have good reasons for some of these, but I think we need to justify each test. Perhaps there are comments with justifications under the individual tests - I will check later.

@chicoreus
Copy link
Collaborator

@ArthurChapman looks like #102 should be parameterized, while #59 and #60 should not. Added notes in those issues.

@chicoreus
Copy link
Collaborator

I've updated the tables in the comments above accordingly, moving #102 into should be parameterized.

@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Aug 21, 2019

Having looked at your list @chicoreus for tests that "shouldn't" be Paramaterized - I have the following comments.

No. Name Parameter
106 AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON I think this was so people could add characters that they could look for "?", "cf." "aff." or could add others. I'd be happy either way with this one.
102 AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT as I noted in previous comment - should be Paramaterized
59 VALIDATION_GEODETICDATUM_NOTSTANDARD Should not be Paramaterized
60 AMENDMENT_GEODETICDATUM_STANDARDIZED Should not be Paramaterized
51 VALIDATION_COORDINATES_TERRESTRIALMARINE This one was parameterized because of two ways of checking for isMarine 1) using GIS/Google Maps to determine if on land or not 2) using a list of marine species and checking if in that list or not. We could decide to use only one method and then remove from Paramaterized
162 VALIDATION_TAXONRANK_NOTSTANDARD I would be happy for us to decide to go with the GBIF Rank Vocabulary (there is no real alternative) and remove Paramaterization
163 AMENDMENT_TAXONRANK_STANDARDIZED I would be happy for us to decide to go with the GBIF Rank Vocabulary (there is no real alternative) and remove Paramaterization
104 VALIDATION_BASISOFRECORD_NOTSTANDARD I would be happy for us to decide to go with the DwC recommended (it can always be formal;ised later) and remove Paramaterization
63 AMENDMENT_BASISOFRECORD_STANDARDIZED I would be happy for us to decide to go with the DwC recommended (it can always be formalised later) and remove Paramaterization
133 AMENDMENT_LICENSE_STANDARDIZED Problem I see here is that we are following dcterms:license - which could be broader than just Creative Commons. Do we wish to restrict to Creative Commons, or allow other license conditions to be valid? and thus allow someone to chose different vocabulary?
38 VALIDATION_LICENSE_NOTSTANDARD Problem I see here is that we are following dcterms:license - which could be broader than just Creative Commons. Do we wish to restrict to Creative Commons, or allow other license conditions to be valid? and thus allow someone to chose different vocabulary?
97 VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED I think this was so people could add characters that they could look for "?", "cf." "aff." or could add others. I'd be happy either way with this one.
115 AMENDMENT_OCCURRENCESTATUS_STANDARDIZED Currently, DwC only recommends "present" "absent". I understand some would like this broadened. But as it stands with only two options, I don't see why it should be Paramaterized unless a community (invasives?) want to use a different vocabulary. @tucotuco paramaterized this - what was the thinking? A paper currently in press is recommending modification to include a third term "doubtful" - but if this is accepted (or not) - I only see the one vocabulary that we would be using - and hopefully it will be eventually formalised beyond a mere DwC recommendation. I thus don't see a strong justification for Paramaterization
116 VALIDATION_OCCURRENCESTATUS_NOTSTANDARD See comment above.
20 VALIDATION_COUNTRYCODE_NOTSTANDARD As noted in a comment under #20, I see no reason for Paramatarization
48 AMENDMENT_COUNTRYCODE_STANDARDIZED As noted in a comment under #20, I see no reason for Paramaterization
62 VALIDATION_COUNTRY_COUNTRYCODE_INCONSISTENT As noted in a comment under #20 we refer in the description to an ISO code, so I see no reason for Paramaterization
73 AMENDMENT_COUNTRYCODE_FROM_COORDINATES This might be a more difficult one as the ISO Standard doesn't have geographic boundaries. So there may need to be some variation on what one chooses as the method for determining boundaries. We still have decide on this....
50 VALIDATION_COORDINATES_COUNTRYCODE_INCONSISTENT Similar to #73
118 AMENDMENT_GEOGRAPHY_STANDARDIZED The geography ones, I am not sure about - we need further discussion on these and what we should use. TGN may be OK for some - Google Maps for others???? There is a discussion somewhere under an issue that I can't find at the moment.
139 VALIDATION_GEOGRAPHY_NOTSTANDARD See comment above under #118
21 VALIDATION_COUNTRY_NOTSTANDARD See comment above under #118
95 VALIDATION_GEOGRAPHY_AMBIGUOUS See comment above under #118

@ArthurChapman
Copy link
Collaborator Author

Agreed @chicoreus re #102, #59 and #60. #102 Paramaterized, #59 and #60 not - with bdq:sourceAuthoriity=http://epsg.io/

@ArthurChapman
Copy link
Collaborator Author

Copied from #102 as comment applicable to more than just that test
With all tests (especially NOTSTANDARD and STANDARDIZED tests) that use an external Standard - ISO, DCMI, EPSG, or any Vocabulary, the vocabulary, standard, etc. is the bdq:sourceAuthority and you are checking to see if the value in the record is a valid record in the bdq:sourceAuthority (in the case of Validations) or can be amended to conform with a value in the bdq:sourceAuthority (in the case of Amendments). In nearly all cases, there is only one sourceAuthority (except as @chicoreus mentions with Taxon names), so there is no choice of sourceAuthority needed, only the choice of a value from that sourceAuthority. Those few cases where there is a choice of sourceAuthority (taxon names) you require both 1) a choice of bdq:sourceAuthority, and 2) a choice of value within that source authority. Thus, I agree with @chicoreus that we don't need as many Paramaterized tests as we have previously so tagged. Unless @tucotuco has justifications for them that we have not thought of.

@ArthurChapman
Copy link
Collaborator Author

#133 and #38 I think should be Paramaterized - see my comments in the table above i.e. "Problem I see here is that we are following dcterms:license - which could be broader than just Creative Commons. Do we wish to restrict to Creative Commons, or allow other license conditions to be valid? and thus allow someone to chose different vocabulary?" I am also concerned that some jurisdictions may legislate the licences they can use within that jurisdiction and they may not be Creative Commons

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 28, 2019

Thanks @chicoreus and @ArthurChapman. Reading through the table and your comments Arthur, here is my take on it. Maybe after a Pinot Noir or two, I would think differently.

#106 - Parameterised
#102 - Parameterised
#59 - Not parameterised
#60 - Not parameterised
#51 - Parameterised (for now)
#162 - Not parameterised
#163 - Not parameterised
#104 - Not parameterised
#63 - Not parameterised
#133 - Parameterised
#38 - Parameterised
#97 - Parameterised
#115 - Not parameterised
#116 - Not parameterised
#20 - Not parameterised
#48 - Not parameterised
#62 - Not parameterised
#73 - Parameterised
#50 - Parameterised
#118 - Parameterised
#139 - Parameterised
#21 - Parameterised
#95 - Parameterised

@tucotuco : We would value your discerning eye (or two) on this lot. I'll hold off edits for a response. I hope all is ok over there.

@ArthurChapman
Copy link
Collaborator Author

I Think you missed a few @Tasilee
Paramaterized
#22, #28, #36, #38, #45, #46, #57, #70, #71, #76, #77, #81, #83, #84, #79, #102, #107, #112, #122, #123, #133

Not Paramaterized
#20, #21, #48, #50, #51, #59, #60, #62, #63, #73, #95, #97, #104, #106, #115, #116, #118, #139, #162, #163

@tuco might particularly like to comment on (see my table and comments above) #51, #115, #116, #73, #50, #118, #139, #21, #95

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 28, 2019

@ArthurChapman: I was using the table only..so will add missing into here. And BTW, you also missed #39 (Parameterised), #79 isn't parameterised:

#20 - Not parameterised
#21 - Parameterised
#22 - Parameterised
#28 - Parameterised
#36 - Parameterised
#38 - Parameterised
#39 - Parameterised
#45 - Parameterised
#46 - Parameterised
#48 - Not parameterised
#50 - Parameterised
#51 - Parameterised (for now)
#57 - Parameterised
#59 - Not parameterised
#60 - Not parameterised
#62 - Not parameterised
#63 - Not parameterised
#70 - Parameterised
#71 - Parameterised
#73 - Parameterised
#76 - Parameterised
#77 - Parameterised
#79 - Not parameterised
#81 - Parameterised
#83 - Parameterised
#84 - Parameterised
#95 - Parameterised
#97 - Parameterised
#102 - Parameterised
#104 - Not parameterised
#106 - Parameterised
#107 - Parameterised
#112 - Parameterised
#115 - Not parameterised
#116 - Not parameterised
#118 - Parameterised
#122 - Parameterised
#123 - Parameterised
#133 - Parameterised
#139 - Parameterised
#162 - Not parameterised
#163 - Not parameterised

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 29, 2019

I am presuming for the Not parameterised above, we move any reference to a default source authority to the References section? That is, the Parameter field is EMPTY.

@ArthurChapman
Copy link
Collaborator Author

@Tasilee I guess that would make sense, however it doesn't distinguish the default or target source Authority from any other reference. Perhaps we should put them in the Reference but as "bdq:sourceAuthority=xxxxxxx" and then the other references

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 30, 2019

@ArthurChapman - that seems like a good strategy. I'll tackle the updates on Monday to give @tucotuco and @pzermoglio a chance to comment.

@tucotuco
Copy link
Member

tucotuco commented Sep 6, 2019

Sorry folks, though I think there are a couple of good catches in this discussion, I am afraid that some of it will take us into circular reasoning. I think most of the tests that were tagged to be parametrized were correctly so. A big part of my stance on this is hidden in a comment to issue #63 (#63 (comment)). Basically, Darwin Core is not a source authority for values. But that is only part of the issue. The other is that we can't make standardizations without a thesaurus (or at least a simple lookup table) - controlled vocabularies are not enough. This is the reason we brought TG4 into existence, recognizing this fundamental need to develop the tests in tandem with the vocabularies that allow them to actually function.

Some specific comments...

I would like to challenge this statement by @chicoreus:
"Tests should only be parameterized when we have identified user stories in the areas that TG3 examined that clearly have different parts of the community wishing to use different parameters."

Why? Can't it be evident aside from the work in TG3? Are the results of TG3 exhaustive for all time?

I would also like to propose an amendment to the statement by @chicoreus:

"Parameters must not point to hypothetical resources that are not available to implementors."

Instead of "Parameters", this should be "Default sources".

@Tasilee asked "Should we

  1. leave the phrase specified source authority or should we use
  2. bdq:sourceAuthority ?

I vote for bdq:sourceAuthority. For example, change "using a specified source authority service" to "using the bdq:sourceAuthority".

I would like to challenge this statement by @chicoreus:

"We must not specify parameters that point implementors to a resource from which the controlled vocabulary for a particular test can be found, that is something for the notes. When the specification says, e.g. compliant if matching ISO vocabulary x, then the implementor must use that vocabulary, and where they get it an how they get it is an implementation detail, not a parameter."

I agree for VALIDATION tests where the vocabulary is written in stone. This is not true of most Darwin Core terms, which make recommendations, not requirements. The philosophy has always been to decouple requirements from definitions wherever possible. All of the AMENDMENT_ tests need a parameter to point to a source for the lookups. If we only used controlled vocabularies, we couldn't do any standardization, because only the standard values would be found, not the values from which the standard values would be determined. I do agree that there is a subset of tests that we currently have as parametrized that need not be. To me, these are only #20 (TG2-VALIDATION_COUNTRYCODE_NOTSTANDARD), #21 (TG2-VALIDATION_COUNTRY_NOTSTANDARD), #59 (TG2-VALIDATION_GEODETICDATUM_NOTSTANDARD), #79 (TG2-VALIDATION_DECIMALLATITUDE_OUTOFRANGE), #162 (TG2-VALIDATION_TAXONRANK_NOTSTANDARD). #21 and 59 will need to be explicit about the expectations. For example, for #21, it must be explicit whether the preferred name is the standard name, or if any of the names in any of the names or codes are acceptable standard names. For #59, it will need to be made explicit whether the epsg code is the only standard (because its the only thing that is unambiguous), or if any of the names in Geodetic CRS, Datum, or Ellipsoid are also acceptable.

Again, sorry, especially that it took this long to respond, but it was unavoidable.

@ArthurChapman
Copy link
Collaborator Author

One issue that @tucotuco's comments bring up is the urgent need for Vocabularies of Values to be created for all the current Darwin Core terms that are currently refrerred to in the tests. Perhaps TG4 (at Leiden?) needs to establish a working group under the TG with the remit to create as many Vocabularies of Values for those terms that are possible in the short term (especially beginning with the easy ones). Some, I think, only have a limited number of terms, but we will need to formalise them under the format that TG4 is proposing to develop. I guess a first step is to make a list, with an assessment of what is required, and a work program. @pzermoglio something for the agenda in Leiden - perhaps discuss informally on the Sunday.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 8, 2019

Thanks @tucotuco. Good to have your insights again, but I am struggling. I will repeat a comment I made somewhere among the tests. We have two scenarios for Parameterised

  1. Genuine options for bdq:sourceAuthority (e.g., TG2-VALIDATION_FAMILY_FOUND #28) and
  2. Options for a default value (e.g., TG2-AMENDMENT_LICENSE_STANDARDIZED #133 )

Your comment "we can't make standardizations without a thesaurus (or at least a simple lookup table) - controlled vocabularies are not enough" focuses on the second scenario. But surely we can't anticipate every possible misspelling or incorrectly interpreted 'value' to lookup? I guess I am assuming in at least some of the AMENDMENTS, that we are using pattern matching in the test code to have a stab at interpreting a potential target. Take the example in #133

dc:license="CCZero" becomes dc:license="https://creativecommons.org/publicdomain/zero/1.0/", following the Creative Commons vocabulary.

@tucotuco: You are implying that we have a thesaurus that contains "CCZero"?

As usual, I am probably missing something.

Also, I have to bow to your Darwin Core philosophy: "Darwin Core is not a source authority for values". Our tests are Darwin Core based (and hence scenario 1 above is not applicable), but scenario 2 is. We are indeed stuffed in terms of vocabs (let alone thesauri), hence TG4, but we need to grab onto any straw we currently have, and DwC 'values' are a 'port in a storm'?

@ArthurChapman
Copy link
Collaborator Author

@Tasilee I think we do need vocabularies/thesauri. License is a difficult one - but CCZero could = CC0 (1.0) or CC0 (1.0) Universal, etc. and then link to https://creativecommons.org/publicdomain/zero/1.0/. Also with many of the earlier Creative Commons there were many Ports (versions in different languages - see for examplke, https://creativecommons.org/tag/porting/). Version 4.0 is suppoosed to be a Universal set without the need for Porting, and that is encouraged for all new uses. A thesuarus would hopefully list these and (maybe) sononymise many.

@tucotuco has extracted the licensing records from GBIF. Many (majority) are in the form of "ex coll. " These aren't very helpful as they just refer back to the original institution, etc. I am looking through the list to see if we can extraxt a basic set of options - especially with CC, but in addition there are various country licenses (e.g. http://open.canada.ca/en/open-government-licence-canada) and there are ODC licenses (Open Data Commons) - e.g. Open Data Commons Attribution License: http://www.opendatacommons.org/licenses/by/1.0/. I will see what I can come up with when I get time.

@tucotuco
Copy link
Member

tucotuco commented Sep 9, 2019 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 9, 2019

@tucotuco - "Pattern matching is an implementation solution". I agree. I was unaware of the extent on thesauri to our issues - which is a more 'standard' solution that is openly accessible and hopefully understandable.

This reminds me of the eureka moment aeons ago in TDWG (TIP days) when I realized that we needed an effective environment for the creation and management of ontologies. We needed an environment created by 'programmers' that made it easy to add terms, definitions and relationships. As far as I am aware, such a user (application domain specialist)-centric environment still doesn't exist (but I could be wrong as I have not recently researched it).

I think such an environment for biodiversity informatics-related thesauri (term -> preferred standard term, definition, comments and links etc) would be nice. A wiki style of management? A list by itself is a start, but when isolated and without provenance, is less than optimal. Governance is a key issue. If there is an 'authority', grand, but the system still needs to be open to public comment for efficient improvements.

@tucotuco
Copy link
Member

tucotuco commented Sep 9, 2019 via email

@tucotuco tucotuco changed the title Paramaterized Parameterized May 19, 2020
@Tasilee Tasilee changed the title Parameterized TG2 - Parameterized Jun 17, 2020
@Tasilee
Copy link
Collaborator

Tasilee commented Jun 29, 2020

We have a quorum to CLOSE.

@Tasilee Tasilee closed this as completed Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants