Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: avoid known file signatures in datatypeId #155

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

gmaclennan
Copy link
Member

This is just a performance optimization for the Mapeo indexer so that it
avoids trying to parse files that are not Mapeo Docs. For example, a
hypercore might have PNG files written to it, which is prefixed by '89
50 4E 47 0D 0A 1A 0A'. If we used this as a dataTypeId then the indexer
would think any PNGs in the core are a Mapeo datatype and try to parse
them. It would fail and just be ignored, but trying to parse would have
a performance cost.

This is a check in the build script that will throw an error if a new
dataType is added that matches one of the known file signature prefixes.
In some cases we don't check against the whole file signature - we just
avoid starting data type IDs with byte(s) that are common in file
signatures.

This is just a performance optimization for the Mapeo indexer so that it
avoids trying to parse files that are not Mapeo Docs. For example, a
hypercore might have PNG files written to it, which is prefixed by '89
50 4E 47 0D 0A 1A 0A'. If we used this as a dataTypeId then the indexer
would think any PNGs in the core are a Mapeo datatype and try to parse
them. It would fail and just be ignored, but trying to parse would have
a performance cost.

This is a check in the build script that will throw an error if a new
dataType is added that matches one of the known file signature prefixes.
In some cases we don't check against the whole file signature - we just
avoid starting data type IDs with byte(s) that are common in file
signatures.
@gmaclennan gmaclennan self-assigned this Oct 9, 2023
@gmaclennan gmaclennan merged commit 01c3bf6 into main Oct 26, 2023
6 checks passed
@gmaclennan gmaclennan deleted the chore/datatype-id-check branch October 26, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants