-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs update] Examples to be included #135
Comments
I can start producing the example data maps. |
The aim is to have:
Hazard examples
|
Exposure examplesFigure example for each data type (random locations):
see spreadsheet and json metadata for Central Asia residential exposure - current future scenarios |
Loss examplesNote: Use Central Asia SFRARR project / Africa R5 as examples
|
Should we attach a download link for each of the datasets shown in the example? E.g. OSM data for the city shown, hazard layer, etc. |
My understanding is that the purpose of the examples is to help readers to understand how RDLS metadata can be used to describe different aspects of risk datasets. I think that we should aim for the text and screenshots for each example to provide sufficient information about the relevant aspects of the datasets. Otherwise, it would be a lot of extra work for readers to download each example and open it in an appropriate software package. |
@odscjen is it ok to provide examples as this (markdown-html), or should it be turned into json? |
Ultimately we'll want to provide them in both markdown-html AND in JSON. For now markdown is fine and once the spreadsheet template and CoVE are up and running we can convert them into JSON as well. |
@matamadio an important thing when creating these examples is to ensure you're using the field titles and codelist values (can use the labels rather than the codes for ease of readin) from the schema and included all of the required fields. Looking at the Hazard examples you've gotten so far there's a few errors:
Dates should be in YYY-MM-DD format.
'River flood' isn't in the
For all the examples where |
Thanks Jen, fixed examples but missing the last comment: still unsure on how I should indicate occurrence probability in the most common case (return period scenarios 1/n). This is the case of the flood models where Analysis type: Probabilistic |
Sorry that final comment I had misread the schema! I think there are 2 options here:
|
For the sake of quick example, I would pick option 1. |
From today's check-in call with @matamadio and @odscrachel, we agreed that @matamadio will prepare examples using the spreadsheet template using only the relevant fields (i.e. not full RDLS metadata files). We can then convert those into JSON format to store in the repository which should give us the flexibility to present them in the documentation as needed (e.g. using field titles rather than JSON paths). |
Spreadsheet example about Fathon global dataset.
|
About the example panel:
|
Yep. Given the length of some of the field values, I think it's best to show each in a separate tab. I've tested this out by adding the Fathom hazard example in #196. Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table). In particular, it would be good to get your feedback on:
The advantages of using separate tables and including identifiers are:
The downside is that it makes the tabular example longer than presenting all the values in the same table and without identifiers. If you're happy with the general approach, then I think the best workflow is for you to do the initial preparation of the examples using the spreadsheet template, we can then convert them to JSON to add to the standard repository and the pre-commit script will handle creating the human-friendly CSVs for display in the documentation. For ongoing maintenance, it will be easiest to edit the JSON files directly. |
Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json. I'll produce additional examples to add in the gdrive folder, nametag _docsample |
See example for exposure: built-up surface (GHS): rdls_exp-GHS_docsample.xlsx Figure: Note 1: different from the real example provided about Thailand, this one indicates the whole global dataset and not a derived national subset. Also attribution is different. |
Example for Vulnerability: rdls_vln-FL_JRC
Can be used either for docs snippet and as full example. |
rdls_vln-FL_JRC.xlsx
|
Thanks for the feedback, sorry for the missing/wrong input!
URL for this example to be replaced with specific resource data (zip to be hosted in GH docs/_datasamples or similar).
Commenting in the excel file
Thanks, this needs to be explained in description. Please note this (and other country examples) uses ISO3166-1-alpha2: first level unit (country), 2 letters code.
Else the url could point to the exising datacatalog page (from where resource can be requested).
Sorry - they are all hazard type: flood; 1 and 2 process is fluvial flood, while 3 is pluvial flood.
I would put taxonomy as optional here. Originally these were based on Corine Land Cover classes (CLC), but in the end they use their own general taxonomy for splitting curve types. So "internal" is ok. |
'describedby' is correct. It is an IANA link relation type, which are all lowercase. |
ah, okay, this is getting reported as an error in every JSON conversion |
this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option. |
Please can you share the data and command(s) that you're using in a new issue? I converted and tested |
@duncandewhurst I used the flatten tool command from that issue but I was using https://www.jsonschemavalidator.net/ for the validation. The schema is definitely the current dev branch schema but I get the following error message
|
Ah, so as I mentioned in the issue description:
As expected, there are no errors when validating against draft 2020-12 using check-jsonschema. |
We need to make sure, when linking to the datacatalog, we are NOT using https://datacatalog.worldbank.org/int/search/..., which is internal only (and the default when Mat, Pierre, I copy a link, but make sure to remove the 'int/' to make it visible externally: https://datacatalog.worldbank.org/search/... |
I agree, following the example for hazard, rather than for exposure looks much better. Easy to tab between each representation of the example, and very clear where to find the examples. |
@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss? |
I also created, as a test, a sheet containing 6 zipped resources containing flood hazard map geotiffs. |
With the exception of I updated the JSON files to reflect the latest version of the schema, but I haven't updated the spreadsheets that were used to generate them. I also corrected one semantic error in To reduce the length of the schema reference page, I've nested the examples with collapsible drop-downs. Where there is more than one example for a component, only the first example is uncollapsed. If there is no figure for an example, it is collapsed. I couldn't find a suitable figure for the Central Asia exposure examples, but I took a screenshot from global flood depth-damage functions PDF to use as a figure for that example: The row titles in the tabular examples now include the titles of intermediary objects so that it is possible to distinguish between, for example, publisher name and creator name (previously they were both titled 'name'): To reduce the amount of screen space taken up by the JSON examples, they are now collapsible, with objects and arrays collapsed by default: |
Very nice, thanks.
|
The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway. |
Addressed in #214.
Added in #196. I don't think there's anything else to do for this issue until the loss example is ready. Let me know if that's wrong! |
@matamadio and @stufraser1 to discuss and prepare loss examples. |
One example of loss data (results of the analysis) from CCDR: Download THA_RSK.xlsx This represents one specific country, but the same template applies to any country I've been working on.
The tabular data for the ADM scores is also provided as geospatial (gpkg). It does not have an explicit loss curve chart, but has all the elements to build it. @stufraser1 should it fit in the schema in the current state, or do you have any suggestion for better formatting? This is key as Im just now setting the default for the new year analytics. |
I would say there are sheets in there that wouldn't normally go into the loss component:
My preference for describing these files in RDL Loss would be to include this as a dataset, and give each sheet as its own resource (.csv), rather than an xlsx book so users can see the list of resource descriptions per dataset, rather than them navigating in many sheets, but I see it could be described in metadata using the existing structure with the workbook as a single resource. |
I have some questions about the loss schema. THA_CCDR_RSK_ADM1.xlsx describes loss output for 2 hazards (river floods and coastal floods) over 2 exp categories. The complete standard ouput would include 5-6 hazards and 3 exposed categories. Metadata spreadsheet has loss attributes at the dataset level, so I have to create 4 dataset rows. But all these information are actually in just one file. Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array? |
Good catch. |
I also have a couple of issues testing with a return period dataset:
Here is the loss metadata file for use in the loss example: |
images for exposure examples: |
Each row in the I've drafted a PR for the changes proposed in #135 (comment) and #135 (comment):
@stufraser1 @matamadio I will leave it up to you to decide if you want to merge this PR for inclusion in the 0.2 release or leave it for later. My sense is that modelling for loss metadata warrants further exploration (I'll open an issue), but that the changes in the PR are an improvement over the current model so I would merge it. I'll hold off preparing a PR to add the loss examples until we have decided what to do about the schema as if the schema changes the examples will need to be updated. I've also left some comments on the SFRARR example spreadsheet where I think some fields may have been populated incorrectly. @stufraser1 I've shared my feedback on your other questions and suggestions below.
This was discussed at some length in #75, but the conversation in that issue took a different direction so I don't think it was fully resolved. My preferred approach is not to worry about units and instead to model the kind of quantity being measured (currency, in this case) since users can convert between units of the same quantity kind. That is the approach we settled on for exposure metrics and I think it would make sense to have consistent modelling for exposure metrics and impact metrics. However, that is quite a significant change to consider at this stage for 0.2. The alternative solution that I proposed was to add an
The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code. So the options are:
If needed, we can do option 2 for the 0.2 release and work on option 3 for the next release. Let me know what you want to do.
It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in I think that this warrants further investigation, but I don't think we'll resolve it in time for 0.2.
Regarding the spreadsheet template, I can see a link in the template and in the rdls_template_loss_SFRARR_eqrisk.xlsx (see below). Where is it missing from? Good catch on the broken link in the documentation, this was because some codelists links in the schema included
This is because
In rdls_template_loss_SFRARR_eqrisk.xlsx, it looks like you might've copy-pasted the value from the
|
Looks like the symbology represents ADM codes; it would be great to show attribute of exposure value with legend. If you can hook me to the dataset, I can produce those maps quickly.
Agree on avoiding unnecessary nesting, it should be just one
I approved the change, as it already improves the usability. Would implement in 0.2, and wait next release for other refinements.
I'd say 2; most intuitively, as an optional field if
I would:
|
I've merged the loss updates PR so that we can release 0.2. The examples will need updating to reflect the schema changes and adding to the schema reference page. That can be done without needing to make another release because the examples themselves aren't normative.
In the interest of not delaying the 0.2 release further, I've left this for now as it will require further discussion ("monetary" is a quantity kind rather than a unit) |
List of examples to be produced and included in Docs
Please add any subject that requires an example (figure, table, other) to be explained properly in the docs.
Aims:
Hazard
Exposure - examples to show multiple data types
Vulnerability
Loss
The text was updated successfully, but these errors were encountered: