Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2 - Test Data Framework #189

Closed
Tasilee opened this issue Sep 30, 2020 · 32 comments
Closed

TG2 - Test Data Framework #189

Tasilee opened this issue Sep 30, 2020 · 32 comments

Comments

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 30, 2020

A Zoom discussion September 29/30 recommended that we develop unit tests for each of the VALIDATIONs. The main justifications (thanks to @tucotuco) were extensibility and minimal maintenance considering the evolution of the Darwin Core standard on which the TG2 tests are based.

We have 65 VALIDATIONs and would value any assistance in the creation the unit tests based on what @chicoreus has proposed with the following template using #187 as an example.

Test VALIDATION_MAXDEPTH_OUTOFRANGE
GUID 3f1db29a-bfa5-40db-9fd1-fde020d81939
Column 1 is the INPUT (one column for each InformationElement in the test)
Columns 2-3 are the parameter values (one column for each Parameter in the test)
Columns 4-6 are the expected output, values in columns 4 and 5 must match exactly.
Column 7 is a remark on the row in this table, not part of the expected output.

See https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE_%23187.csv for the latest version of this file.

dwc:maximumDepthInMeters bdq:minimumValidDepthInMeters bdq:maximumValidDepthInMeters Response.Status Response.Result Response.Comment Remark
100 0 11000 RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 100 is in the range 0 to 11000]
100     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 100 is in the default range 0 to 11000]
200000     RUN_HAS_RESULT NOT_COMPLIANT [any human readable explanation, e.g. 200000 is outside the range 0 to 11000]
0.4     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 0.4 is in the range 0 to 11000]
0     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 0 is in the range 0 to 11000]
11000     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 11000 is in the range 0 to 1100]
thirty     INTERNAL_PREREQUISITES_NOT_MET   [any human readable explanation, e.g. provided value must be a number to be validated]
      INTERNAL_PREREQUISITES_NOT_MET   [any human readable explanation, e.g. a value must be provided to be validated]
null     INTERNAL_PREREQUISITES_NOT_MET   [any human readable explanation, e.g. provided value must be a number to be validated]
-145.3     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. provided value is outside the range 0 to 11000]
1000 10 100 RUN_HAS_RESULT NOT_COMPLIANT [any human readable explanation, e.g. 1000 is outside the provided parameter range 10 to 100] [Note, non-default parameters should carry through to the Response.Comment]
[no depth specified]     INTERNAL_PREREQUISITES_NOT_MET   [any human readable explanation, e.g. provided value must be a number to be validated]
115,2     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 115,2 is in the range 0 to 11000 where both . and , are recognized as decimal separators]
115.2     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. 115.2 is in the range 0 to 11000 where both . and , are recognized as decimal separators]
1,828.8     INTERNAL_PREREQUISITES_NOT_MET   [any human readable explanation, e.g. comma not recognized as a place separator, provided value must be a number] [This case needs discussion, it is a plausible input value ]
1 828.8     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. space recognized as a place separator, provided value is in range 0 to 11000] [This case needs discussion, this is an implausible value but fits SI expectations]
1,828.8     RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. leading and trailing spaced should be trimmed, provided value is in range 0 to 11000] [Note: The input is the string " 1,828.8 " with leading and trailing spaces, but without the quotation marks]
1828.8 RUN_HAS_RESULT COMPLIANT [any human readable explanation, e.g. leading and trailing spaced should be trimmed, provided value is in range 0 to 11000] [Note: The input is the string " 1828.8 " with leading and trailing spaces, but without the quotation marks]
-354 RUN_HAS_RESULT NOT_COMPLIANT [any human readable explanation, e.g., the value is a negative number and is therefore outside the permissible range]
@Tasilee
Copy link
Collaborator Author

Tasilee commented Sep 30, 2020

From @chicoreus

  1. All inputs are assumed to be of type string, and it is the responsibility of the test suite to convert them to appropriate other types when needed (integers, floating point values).
  2. It is the responsibility of the test suite to trim leading and trailing whitespace from each input.

Questions

  1. For non-integer numbers, do we specify, as SI, either comma or period as the decimal separators (thus 146.5 and 146,5 are treated as the same number)? (I think yes).
  2. For numbers, do we specify, as SI, that only a space may be used to separate every three places in a number, or are we mute on this (e.g. treating "1,000.4" as not a number, treating "1 000.4" as a number 1000.4, and treating "1,000" as a number 1.000 (one, not one thousand)), or do we not specify, and leave handling of this to the implementation language's number parser (e.g. java's Integer.parseInt(String s) or Float.parseFloat(String s). (I'm not sure).

@ArthurChapman
Copy link
Collaborator

Question 1 - definitely YES (Pity the world doesn't have one standard for this!)

Question 2 - is there an ISO standard or some other standard we can cite for this?

@Tasilee
Copy link
Collaborator Author

Tasilee commented Sep 30, 2020

These are the VALIDATIONs ordered by Darwin Core Term

LInk Dimension Term_Action Lee Arthur Paul John/Paula
#58 Other BASISOFRECORD_EMPTY X X
#104 Other BASISOFRECORD_NOTSTANDARD X X
#77 Name CLASS_NOTFOUND X X
#123 Name CLASSIFICATION_AMBIGUOUS X X
#50 Space COORDINATES_COUNTRYCODE_INCONSISTENT
#56 Space COORDINATES_STATE-PROVINCE_INCONSISTENT
#51 Space COORDINATES_TERRESTRIALMARINE
#87 Space COORDINATES_ZERO
#109 Space COORDINATEUNCERTAINTY_OUTOFRANGE
#62 Space COUNTRY_COUNTRYCODE_INCONSISTENT
#42 Space COUNTRY_EMPTY X X
#21 Space COUNTRY_NOTSTANDARD
#98 Space COUNTRYCODE_EMPTY X X
#20 Space COUNTRYCODE_NOTSTANDARD
#69 Time DATEIDENTIFIED_NOTSTANDARD
#76 Time DATEIDENTIFIED_OUTOFRANGE
#147 Time DAY_NOTSTANDARD X X X
#125 Time DAY_OUTOFRANGE
#103 Other DCTYPE_EMPTY X X
#91 Other DCTYPE_NOTSTANDARD X X
#119 Space DECIMALLATITUDE_EMPTY X X
#79 Space DECIMALLATITUDE_OUTOFRANGE
#96 Space DECIMALLONGITUDE_EMPTY X X
#30 Space DECIMALLONGITUDE_OUTOFRANGE
#131 Time ENDDAYOFYEAR_OUTOFRANGE
#88 Time EVENT_TEMPORAL_EMPTY
#33 Time EVENTDATE_EMPTY
#67 Time EVENTDATE_INCONSISTENT
#66 Time EVENTDATE_NOTSTANDARD
#36 Time EVENTDATE_OUTOFRANGE
#28 Name FAMILY_NOTFOUND X X
#122 Name GENUS_NOTFOUND X X
#78 Space GEODETICDATUM_EMPTY X X
#59 Space GEODETICDATUM_NOTSTANDARD
#95 Space GEOGRAPHY_AMBIGUOUS
#139 Space GEOGRAPHY_NOTSTANDARD
#81 Name KINGDOM_NOTFOUND X X
#99 Other LICENSE_EMPTY X X
#38 Other LICENSE_NOTSTANDARD X X
#40 Space LOCATION_EMPTY X X
#187 Space MAXDEPTH_OUTOFRANGE X X X
#112 Space MAXELEVATION_OUTOFRANGE
#24 Space MINDEPTH_GREATERTHAN_MAXDEPTH
#107 Space MINDEPTH_OUTOFRANGE X X
#108 Space MINELEVATION_GREATERTHAN_MAXELEVATION
#39 Space MINELEVATION_OUTOFRANGE
#126 Time MONTH_NOTSTANDARD X X X
#47 Other OCCURRENCEID_EMPTY X X
#23 Other OCCURRENCEID_NOTSTANDARD X X
#117 Other OCCURRENCESTATUS_EMPTY X X
#116 Other OCCURRENCESTATUS_NOTSTANDARD X X
#83 Name ORDER_NOTFOUND X X
#22 Name PHYLUM_NOTFOUND X X
#101 Name POLYNOMIAL_INCONSISTENT X X
#82 Name SCIENTIFICNAME_EMPTY X X
#46 Name SCIENTIFICNAME_NOTFOUND X X
#130 Time STARTDAYOFYEAR_OUTOFRANGE
#70 Name TAXON_AMBIGUOUS X X
#105 Name TAXON_EMPTY X X
#121 Name TAXONID_AMBIGUOUS X X
#120 Name TAXONID_EMPTY X X
#161 Name TAXONRANK_EMPTY X X
#162 Name TAXONRANK_NOTSTANDARD X X
#49 Time YEAR_EMPTY X X X
#84 Time YEAR_OUTOFRANGE X X X
#29 Other ANNOTATION_NOTEMPTY X X
#72 All DATAGENERALIZATIONS_NOTEMPTY X X
#94 Other ESTABLISHMENTMEANS_NOTEMPTY X X

Can I suggest @tucotuco makes a start on the SPACE ones, @ArthurChapman on the NAME ones, @chicoreus on the TIME ones and @Tasilee on OTHER and NOTIFICATIONS? Hopefully a few others will offer some help, at least for checking.

@chicoreus
Copy link
Collaborator

I've updated the table slightly, changing 143.5 to a negative value so that the not-compliant result makes sense, adding a remarks column with notes about the tests, and making more explicit the two tests at the end which have leading and trailing space characters as part of the test value. I've also clarified the explanatory text at the top of the table and added examples of human readable explanations where they were absent.

@chicoreus
Copy link
Collaborator

@ArthurChapman for (2) is "1 828.8" [without the quotes, 1000 fathoms, in meters, with a period as the decimal separator and a space separating every three digits (an expected SI format for publication, a very unnatural form for electronic darwin core data, where "1828.8" or "1828,8" serialized from some floating point representation by software into some form of data exchange document would be expected values (with localization of the software doing the serialization making the choice about comma or period as the decimal separator, but most software not adding space separators every three digits in serialized data). My tendency would be to say, we can expect to see "1828.8" or "1828,8" in abundance in the wild, but not "1,828.8" or "1 828,8", and should either not specify how these cases should be handled, or should say that both are expected to be internal prerequisites not met as a general expectation for all darwin core data. For standards, we should probably look for RFCs for serialization of numeric data, rather than ISO or SI representation, as the (numeric or date) values found in data sets will in large part be serializations into uncontrolled string fields of database representations of strongly typed database fields (or less strongly typed and variously formatted spreadsheet columns...).

chicoreus added a commit that referenced this issue Oct 1, 2020
…a for #187.  Filename suggests pattern, testdata_{humanreadablenameoftest}.csv for such test data sets
@ArthurChapman
Copy link
Collaborator

Thanks a million Lee- give me the easy one :-)

@ArthurChapman
Copy link
Collaborator

Thanks @chicoreus I agree with what you suggest, although - certainly in Australia - I think "1,828.8" would be common but happy to have it treated as you suggest.

BTW - what is the easiest way to open that file as an Excel file - can you send it to me separately as just csv. Copy and pasting doesn't seem to work.

@tucotuco
Copy link
Member

tucotuco commented Oct 1, 2020 via email

chicoreus added a commit that referenced this issue Oct 1, 2020
@chicoreus
Copy link
Collaborator

Have data for the time tests in progress, will accept working on the rest of the time test data.

chicoreus added a commit that referenced this issue Oct 1, 2020
…name consistent with case of test label.
@chicoreus
Copy link
Collaborator

@ArthurChapman best way to obtain the csv files is with the raw link. For example, for https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv to the upper right of the table are the buttons Raw and Blame. Raw takes you to the raw csv file https://raw.githubusercontent.com/tdwg/bdq/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv - which is important in these cases, as the data values may be numbers not in quotes, or numbers in quotes as strings with whitespace padding.

I've added the maximum 32 bit signed and 32 bit unsigned integer values, plus those values with 1 added and those values with 2 added, plus the name of the term under test (e.g. dwc:day="day") to each of the three sets of test data I've got up so far. -1, 0, the maximum integer values are good test values to add for any term that takes numeric data.

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Oct 1, 2020

@chicoreus You have an error in testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv - in lines 19 and 20 for the default for bdq:minimumValidDepthInMeters - Depth can never be a negative number. So 18 has to be NOT_COMPLIANT

Also lines 23 and 24 appear identical

@ArthurChapman
Copy link
Collaborator

@chicoreus in testdata_VALIDATION_DAY_NOTSTANDARD.csv Lines appear to be duplicates

@ArthurChapman
Copy link
Collaborator

@chicoreus in testdata_VALIDATION_MONTH_NOTSTANDARD.csv

Lines 4 and 5 appear to be duplicates

Lines 39, 40, 41 should be NOT_COMPLIANT

Should we include "01" etc.

@chicoreus
Copy link
Collaborator

@ArthurChapman testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv , lines 19 and 20 are both correct. They are testing cases where the provided parameter values are outside the defaults, thus does the test listen to the provided parameters or does it treat the defaults as hard limits.

For testdata_VALIDATION_DAY_NOTSTANDARD.csv, check the raw csv file, the duplicated lines are probably cases where leading or trailing spaces are present in one line but not another.

for testdata_VALIDATION_MONTH_NOTSTANDARD.csv lines 4 and 4 differ in whitespace in the input, line 4 has the string "1", line 5 the string " 1" with a leading space. Lines 39-41 are indeed in error.

Yes, leading zeros make sense to test. I have added.

chicoreus added a commit that referenced this issue Oct 1, 2020
…ng leading zeros to tests. Fixing NOT_COMPLIANT out of range month 13.
@ArthurChapman
Copy link
Collaborator

Thanks @chicoreus. I still think it is misleading for the default depth to be a negative number as that is not allowed

From Georeferencing Best Practices

DEPTH "A measurement of the vertical distance below a vertical datum. In this document, we try to modify the term to signify the medium in which the measurement is made. Thus, "water depth" is the vertical distance below an air-water interface in a waterbody (ocean, lake, river, sinkhole, etc.). Compare distance above surface. Depth is always a non-negative number."

@chicoreus
Copy link
Collaborator

If depth is distance from a vertical datum, and dept represents a vertical distance below an air-water interface, then negative values of depth are possible. Consider a vertical datum of mean sea level, and a sample collected in the intertidal, below the surface of the water at a high tide, above the mean sea level vertical datum. Such a sample would be both collected below the air-water interface, and at a distance above (thus negative) from the vertical datum. If, however, depth can never be a negative value, then we need to be explicit about that in the specification for VALIDATION_MAXDEPTH_OUTOFRANGE and other depth related tests such that the test is explicit about regardless of the parameterization, zero is the smallest allowed value for depth, and even if a negative value is provided as a parameter, the test must still return not compliant for depths smaller than zero.

@ArthurChapman
Copy link
Collaborator

What you are describing Paul is

distance above surface
In addition to elevation and depth, a measurement of the vertical distance above a reference point, with a minimum and a maximum distance to cover a range. For surface terrestrial locations, the reference point should be the elevation at ground level. Over a body of water (ocean, sea, lake, river, glacier, etc.), the reference point for aerial locations should be the elevation of the air-water interface, while the reference point for sub-surface benthic locations should be the interface between the water and the substrate. Locations within a water body should use depth rather than a negative distance above surface. Distances above a reference point should be expressed as positive numbers, while those below should be negative. The maximum distance above a surface will always be a number greater than or equal to the minimum distance above the surface. Since distances below a surface are negative numbers, the maximum distance will always be a number less than or equal to the minimum distance. Compare altitude.

chicoreus added a commit that referenced this issue Oct 2, 2020
@tucotuco
Copy link
Member

tucotuco commented Oct 2, 2020 via email

chicoreus added a commit that referenced this issue Oct 2, 2020
…ble comment in one line and adding a file with examples of non-printing characters (unicode u0000,u0007,and u0020, for discussion of for definition of EMPTY() in #111 and #152.
chicoreus added a commit that referenced this issue Oct 2, 2020
…tion, updating status and comment for negative values in test data for #187.
@chicoreus
Copy link
Collaborator

@tucotuco I'm confused. If depth is defined as distance below a vertical datum, and the data as you specify are:

1 m below the surface of the ocean stuck to a rock at a 2 m high tide.
Elevation: 2 m
Vertical Datum: EGM1996
Depth: 1 m
Distance above surface: 0 m

Doesn't this mean that the vertical datum is the datum for both elevation and depth, and the point is both 2 meters above this datum and one meter below this datum, and at the water surface all at the same time?

Shouldn't the values be:
1 m below the surface of the ocean stuck to a rock at a 2 m high tide (2 meters above local Mean Sea Level).
Elevation: 1 m
Vertical Datum: MSL
Depth: null
Distance below surface: 1 m

This tells us that the sample was collected 1 meter above mean sea level for that location, and was 1 meter below the surface of the water at that time.

For nearshore and intertidal localities, particularly with historical data, vertical position is most likely known based on a local mean low tide, mean tide, or mean high tide datum, which may or may not be translatable from the provided data to a global vertical datum.

ArthurChapman added a commit that referenced this issue Oct 6, 2020
In accord with #189 added test data file for TAXONRANK_EMPTY #161
ArthurChapman added a commit that referenced this issue Oct 6, 2020
In accord with #189 added test data file for TAXONRANK_NOTSTANDARD #162
Tasilee added a commit that referenced this issue Oct 6, 2020
In accordance with #189, added file testdata_NOTIFICATION_ANNOTATION_NOTEMPTY_#29.csv
Tasilee added a commit that referenced this issue Oct 6, 2020
In accordance with #189, added file testdata_NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY_#72 for #72
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY_#94.csv for #94
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added test data testdata_VALIDATION_BASISOFRECORD_EMPTY_#58.csv for #58
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_BASISOFRECORD_NOTSTANDARD_#104.csv for #104
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_DCTYPE_EMPTY_#103.csv for #103
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_DCTYPE_NOTSTANDARD_#91.csv for #91
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_LICENCE_EMPTY_#99.csv for #99
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_LICENSE_NOTSTANDARD_#38.csv for #38
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_EMPTY_#47.csv for #47
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_NOTSTANDARD_#23.csv for #23
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_EMPTY_#117.csv for #117
Tasilee added a commit that referenced this issue Oct 7, 2020
IN accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_NOTSTANDARD_#116.csv for #116
@Tasilee
Copy link
Collaborator Author

Tasilee commented Oct 8, 2020

I had a chat with @ArthurChapman after we have discussed some of the issues arising and figure there are at least the following issues to discuss once we have all done our test data.

  1. Response.comment: Do we use a consistent phrasing as in for example "[any human readable explanation, e.g. bdq:annotation is NOTEMPTY]"
  2. How do we include [any non-printing characters]
  3. Where do we need to include an explanation?
  4. bdq:annotation or w3c:annotation or oa:annotation or ... If we need our own definition, maybe bdq:namespace / vocab entry could be useful?
  5. Should we use "NOTIFY if ..." instead of "REPORT if ..." for the NOTIFICATIONs?
  6. ...? Please add issues when they arise.

@ArthurChapman
Copy link
Collaborator

@Tasilee added some columns to the table above for ticking off test data files that have been checked by each of us.

ArthurChapman added a commit that referenced this issue Oct 8, 2020
In accord with #189 added test data file for #42
ArthurChapman added a commit that referenced this issue Oct 8, 2020
In accord with #189 added test data file for #98
ArthurChapman added a commit that referenced this issue Oct 8, 2020
In accord with #189 added test data file for #119
ArthurChapman added a commit that referenced this issue Oct 8, 2020
In accord with #189 added test data file for test #119
ArthurChapman added a commit that referenced this issue Oct 9, 2020
In accord with #189 added tests data file for #78
ArthurChapman added a commit that referenced this issue Oct 9, 2020
In accord with #189 added test data file for #40
ArthurChapman added a commit that referenced this issue Oct 9, 2020
In accord with #189 added test data file for #107
chicoreus added a commit that referenced this issue Oct 12, 2020
… changing data values to be consistent with basis of record, adding explicit alternative vocabularies, clarifying human readable messages, adding column to specify source authority, adding cases for all valid vocabulary values, adding a range of cases for problematic values.
chicoreus added a commit that referenced this issue Oct 12, 2020
…ests for #104, fixing u0000 value in non-printing characters test for #49, all as per #189.
@Tasilee
Copy link
Collaborator Author

Tasilee commented Jan 5, 2021

I created an Excel file (emailed) with worksheets that support one or more test templates from the test datasets done so far (27 SPACE and TIME missing). In doing so (as anticipated), a number of issues arise. Given the propensity of the 99 (plus some 'non-printing character' versions) to diverge from a standard template, can I suggest that we use the worksheets (as CSVs)? Currently there are 7, but a) we aren't done yet and b) there may be a way of combining some of the test datasets.

If we combine tests with the same template into a single worksheet, it is simple to edit. I have organized the data so that it can be sorted easily. The single worksheet makes it easier for me to understand the test data and the same will be true of all those who will be using them.

  1. Can we combine 'Response.comment' and 'Explanation'? There are not many 'Explanations'. We could use a delimiter after the response "|"
  2. Some Response.comments seem out of context, e.g., for COUNTRYCODE_EMPTY, the Response.comment is "[any human readable explanation, e.g. dwc:taxonRANK is EMPTY]". Use of "dwc:taxonRank" has also universally been applied across the SPACE tests that Arthur has added. Related: Do we need "[any human readable explanation, e.g. dwc:taxonRank is not EMPTY]" to be the Response.comment against many value entries? This is a good example of where it would be very easy to edit all those responses in one place.
  3. Do we use "[non-printing characters]" or the characters themselves as Paul has done in two separate test datasets or entries such as "…"? For 'non-printing characters' can we use ISO-8859-1 codes in some form like ISO8859:20 or similar? Having separate non-printing character test sets seems to head away from what I am proposing. We need a standard strategy.
  4. Do we detail all valid values where that is a small set as in DAY_NOTSTANDARD and MONTH_NOTSTANDARD as Paul has done? If so, then do we also do that for Darwin Core vocabs such as BASISOFRECORD_NOTSTANDARD?
  5. Many datasets requiring "bdq:sourceAuthority" were missing this column. I have added these into the original files (and composite worksheets).
  6. Some "bdq:sourceAuthority" entries were missing so I have added them into the original files.
  7. What standard do we use for values of "bdq:sourceAuthority"? For example, should the default reference be added to all test data lines? Currently it is not. I have, for the moment, added a single reference to the first line of each relevant test dataset. There are also several entries "Parameterized Source Authority" that don't seem explicit enough to me.
  8. Do we need an entry for each test data line for "bdq:sourceAuthority.response"? Currently more than 50% are missing any response.

chicoreus added a commit to FilteredPush/bdqtestrunner that referenced this issue Mar 7, 2022
…o convert the rows in @Tasilee's data sheet in the spreadsheet of tests into a csv file suitable for input into a test harness.  Supporting tdwg/bdq#189 used to generate https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data.csv
@chicoreus
Copy link
Collaborator

For the validation data, see:

https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data.csv
and
https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data_nonprintingchars.csv

These csv files are generated from @Tasilee 's spreadsheet into a form that is more readily consumed by a test validation framework by code in https://github.com/FilteredPush/bdqtestrunner

These csv files and guidance for their use is being assembled for TDWG standards track submission in:
https://github.com/tdwg/bdq/tree/master/tg2/_review/docs/implementers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants