Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fetch CSWP (cybersecurity white paper) documents #93

Closed
ronaldtse opened this issue Jun 26, 2023 · 25 comments
Closed

Unable to fetch CSWP (cybersecurity white paper) documents #93

ronaldtse opened this issue Jun 26, 2023 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@ronaldtse
Copy link
Contributor

$ relaton fetch "NIST CSWP 04162018"
[relaton-nist] ("NIST CSWP 04162018") fetching...
[relaton-nist] WARNING: no match found online for NIST CSWP 04162018. The code must be exactly like it is on the standards website.
No matching bibliographic entry found
@ronaldtse ronaldtse added the bug Something isn't working label Jun 26, 2023
@andrew2net
Copy link
Contributor

@ronaldtse we use DOI to create document identifier. The 04162018 number appears in the resource URL only. Should we create document identifiers from the URL?

    ...
    <doi_data>
        <doi>10.6028/NIST.CSWP.6</doi>
        <resource>https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.04162018.pdf</resource>
    </doi_data>
    ...

@ronaldtse
Copy link
Contributor Author

@andrew2net CWSP PubIDs only use the date, not whatever random number they have in the DOI. So we need to fix this.

This is from the NIST PubID document:
Screenshot 2023-08-16 at 8 00 54 PM
Screenshot 2023-08-16 at 8 01 01 PM

PubID_Syntax_NIST_TechPubs.pdf

@ronaldtse
Copy link
Contributor Author

However, notice that the URL also has the wrong "date order". The URL says "04162018". However, according to the PubID syntax, it should be 20180416 (this is exactly the example provided in the second CSWP example in the document).

@ronaldtse
Copy link
Contributor Author

@andrew2net can you build the CSWP PubID using pubid-nist?

@mico
Copy link

mico commented Aug 16, 2023

@andrew2net can you build the CSWP PubID using pubid-nist?

Yes, CSWP identifiers comply with PubID 1.0

@andrew2net
Copy link
Contributor

@andrew2net can you build the CSWP PubID using pubid-nist?

@ronaldtse There are a lot of other NIST IDs that can not be parsed by pubis-nist metanorma/pubid-nist#177

@andrew2net
Copy link
Contributor

@andrew2net CWSP PubIDs only use the date, not whatever random number they have in the DOI. So we need to fix this.

@ronaldtse we could create IDs from URLs for CWSP but the URLs aren't consistent. These URLs contain IDs similar to DOI:

So these documents IDs will still as they are now. Is it ok?

@ronaldtse
Copy link
Contributor Author

Seems that this document is really CSWP 20:
Screenshot 2023-08-17 at 1 37 54 PM

Can you please list out all the CSWPs so we know what's going on?

@andrew2net
Copy link
Contributor

@ronaldtse the updated pubs-export has DOI-like docidentifiers:

  {
    "language": "en",
    "script": "Latn",
    "series": "csrc-white-paper",
    "docnumber": "6",
    "docidentifier": "CSWP 6",
    "revision": null,
    "edition": null,
    "volume": null,
    "uri": "https://csrc.nist.gov/pubs/cswp/6/cybersecurity-framework-v11/final",
    "doi": "10.6028/NIST.CSWP.6",
    "title-main": "Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1",
    "title-sub": null,
    "iteration": "final",
    "issued-date": null,
    "updated-date": null,
    "published-date": "2018-04-16",
    "obsoleted-date": null,
    "status": "final",
    "substage": "active",
    ...

Do we really need to use date-based IDs for CSWP?

@ronaldtse
Copy link
Contributor Author

Seems that the reference has really been changed to "CSWP 6". I've asked for clarification from NIST.

@ronaldtse
Copy link
Contributor Author

@andrew2net the problem is that "CSWP 6" still doesn't work:

$ bundle exec relaton fetch "NIST CSWP 6"
[relaton-nist] ("NIST CSWP 6") fetching...
[relaton-nist] WARNING: no match found online for NIST CSWP 6. The code must be exactly like it is on the standards website.
No matching bibliographic entry found

andrew2net added a commit that referenced this issue Aug 19, 2023
@andrew2net
Copy link
Contributor

Fixed in v 1.14.9

$ relaton fetch "NIST CSWP 6"
[relaton-nist] ("NIST CSWP 6") fetching...
[relaton-nist] ("NIST CSWP 6") found NIST CSWP 6
<bibdata type="standard" schema-version="v1.2.3">
  <fetched>2023-08-19</fetched>
  ...

@ronaldtse
Copy link
Contributor Author

@andrew2net It's not working for me:

Using relaton-nist 1.14.9
...
$ bundle exec relaton fetch 'NIST CSWP 6'
[relaton] (NIST CSWP 6) not found.
No matching bibliographic entry found

@ronaldtse ronaldtse reopened this Aug 20, 2023
@andrew2net
Copy link
Contributor

@ronaldtse the message [relaton] (NIST CSWP 6) not found. is from relaton gem, which acts as a cache. A response from the previous version of relaton-nist was stored in the cache. We had functionality that cleans a cache in case gem's version is changed, but we moved to schema version control. So it needs to run relaton db clear now.

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Aug 21, 2023

@ronaldtse the message [relaton] (NIST CSWP 6) not found. is from relaton gem, which acts as a cache. A response from the previous version of relaton-nist was stored in the cache. We had functionality that cleans a cache in case gem's version is changed, but we moved to schema version control. So it needs to run relaton db clear now.

Then this is really confusing. Users will never be able to figure this out.

When the Relaton-xxx gem is updated, should the cache be wiped? At least the "not found" ones?

This information needs to be described in the output:

  • it should say "not found in cache, if you wish to ignore cache please run with "ignore cache" command or wipe the cache

In any case, we must differentiate a "cache hit not found" vs the "actual not found".

@ronaldtse
Copy link
Contributor Author

I confirm that I can fetch CSWP 6 now. Closing and moving the remaining issue to a new ticket.

@ronaldtse
Copy link
Contributor Author

However, I still cannot fetch this:

$ bundle exec relaton fetch "NIST CSWP 01162020"
[relaton-nist] ("NIST CSWP 01162020") fetching...
[relaton] Downloaded index from https://raw.githubusercontent.com/relaton/relaton-data-nist/main/index-v1.zip
[relaton-nist] WARNING: no match found online for NIST CSWP 01162020. The code must be exactly like it is on the standards website.
No matching bibliographic entry found

@ronaldtse
Copy link
Contributor Author

However, I still cannot fetch this:

$ bundle exec relaton fetch "NIST CSWP 01162020"
[relaton-nist] ("NIST CSWP 01162020") fetching...
[relaton] Downloaded index from https://raw.githubusercontent.com/relaton/relaton-data-nist/main/index-v1.zip
[relaton-nist] WARNING: no match found online for NIST CSWP 01162020. The code must be exactly like it is on the standards website.
No matching bibliographic entry found

Actually this document at the NIST Library is now called "CSWP 10". I'll close this ticket for now. I wonder what the CSRC entry looks like.

@ronaldtse
Copy link
Contributor Author

I received a clarification from @jfnist on CSWPs:

the NIST Library retrospectively assigned unique, sequential identifiers using the new PubID syntax. Each now has a new PubID (e.g., NIST CSWP 29 ipd) and a DOI to match. Any original DOIs that incorporated a release date will still work.

So for CSWP documents (from the CSRC Metanorma feed), we can directly use the CSWP document number for their PubID.

For users who have been using the old CSWP IDs, they will have to manually find out what the new numbers are. Perhaps we could maintain a mapping on the blog? Thoughts @andrew2net ?

@andrew2net
Copy link
Contributor

For users who have been using the old CSWP IDs, they will have to manually find out what the new numbers are. Perhaps we could maintain a mapping on the blog? Thoughts @andrew2net ?

@ronaldtse we can create index with both ID versions, so it'll be possible to use any of them.

@ronaldtse
Copy link
Contributor Author

@andrew2net instead of creating an index for the legacy ID, I'd rather use a blog post to show them (since it's a list that will never change) instead of carrying this functionality in ongoing code.

Can you help do that? Thanks!

@jfnist
Copy link

jfnist commented Aug 23, 2023 via email

@ronaldtse
Copy link
Contributor Author

Thank you @jfnist ! I'll put this up on a blog post 😉 !

@ronaldtse
Copy link
Contributor Author

@jfnist migrated the mapping into relaton/relaton.org#48

@ronaldtse
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants