-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue/72 new api allow for full fledged processing of protection profiles #466
base: main
Are you sure you want to change the base?
Issue/72 new api allow for full fledged processing of protection profiles #466
Conversation
@J08nY first batch of commits that refactored Some of my early design notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. But still has conflicts with main. Is the merge commit a real merge commit?
Meh, something was left out, should be fixed by now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #466 +/- ##
==========================================
- Coverage 68.55% 66.94% -1.60%
==========================================
Files 62 68 +6
Lines 7934 8333 +399
==========================================
+ Hits 5438 5578 +140
- Misses 2496 2755 +259 ☔ View full report in Codecov by Sentry. |
Hey, the initial draft of the functionality is implemented. Some notes below. Sample usageCreate and fully process PP datasetpp_dset = ProtectionProfileDataset(root_dir="/path/to/pp/directory")
pp_dset.get_certs_from_web()
pp_dset.process_auxiliary_datasets()
pp_dset.download_all_artifacts()
pp_dset.convert_all_pdfs()
pp_dset.analyze_certificates() Acess to PP Dataset from CC Dataset
cc_dset.process_auxiliary_dataset(processed_pp_dataset_root_dir="/path/to/pp/directory) In such case, Alternatively, When Notes on PP processing
Next steps
|
pp_latest_full_archive: AnyHttpUrl = Field( | ||
"https://sec-certs.org/cc/pp.tar.gz", | ||
description="URL from where to fetch the latest full archive of fully processed PP dataset.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pp_latest_snapshot
config also needs to change. It will no longer live on the /static/
subdir. But have the same layout as the CC and FIPS datasets. Could you make the change pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I wanted to discuss this first before changing this.
"Trusted Computing", | ||
] | ||
|
||
CC_PORTAL_BASE_URL = "https://www.commoncriteriaportal.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember seeing a nuch of URLs to CC portal stuff duplicated (I guess in the CC sample and dataset class). Is it now unified under subpaths of this base url? It is a nice time to unify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I already did some unification, will double check!
@@ -25,17 +24,25 @@ | |||
|
|||
|
|||
class AuxiliaryDatasetHandler(ABC): | |||
def __init__(self, root_dir: str | Path) -> None: | |||
self.root_dir = Path(root_dir) | |||
RELATIVE_DIR: ClassVar[str | None] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the all caps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a constant
) | ||
|
||
if skip_schemes: | ||
del self.aux_handlers[CCSchemeDatasetHandler] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea for it to delete the handler? It modifies the dataset state and I think that is quite unexpected.
Edit: Maybe modify the only_schemes
attr accordingly instead of deleting the handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I needed something to commit on yesterday, I want to revisit this. It's a bad idea and I'm thinking about better design atm.
@@ -479,7 +493,7 @@ def _get_primary_key_str(row: Tag): | |||
x.st_link, | |||
None, | |||
None, | |||
profiles.get(x.dgst, None), | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change? Does this mean we will have no links to PPs in the df?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I will enrich the DF with one extra attribute for that!
"protection_profiles": null, | ||
"eal": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these entries null
, while the "protection_profile_links"
entry has a single PP element? Is it that the toy dataset was not processed fully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Is in a state after get_certs_from_web()
IMO. These are heuristics attributes filled-in only in the analyze_dset()
step.
@@ -2,6 +2,7 @@ accessible-pygments==0.0.4 | |||
# via pydata-sphinx-theme | |||
aiohappyeyeballs==2.4.0 | |||
# via aiohttp | |||
aiohttp==3.10.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This merge commit messed this up a bit. Same for the vulnerabilities notebook, but that is now fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry for that. I'll bump the reqs once more before merging, treating also some open PRs.
Could you please add some test(s) that run the PP pipeline at least once? I.e. improve the coverage in the |
Sure 🙃 , see
|
Closes #72