You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have at times some portability issues as ScanCode depends on a few performance-critical libraries that use native C code. These require specific pre-built wheels or a toolchain that may not be available on a user's machine.
We could maximize portability by using fallback pure-python libraries for these libraries. For basic features in scancode-toolkit-mini, these would be pyahocorasick and intbitset. Both are libraries that we maintain.
We should have a degraded, not-as-fast but pure Python variants of these libraries so that we can install without these native dependencies, e.g. some scancode-toolkit variant that would be pure Python. This would help also towards running in the browser.
key libraries
pyahocorasick: could expand the built-in simpler pure python implementation to implment the pyahocorasick APIs
intbitset: could roll out a simple set-based fallback
lxml (and other libs based on it such as xmldict): could use stdlib xml.etree instead
For lxml there are common idioms such as:
try:
from lxml import etree
except ImportError:
import xml.etree.ElementTree as etree
The text was updated successfully, but these errors were encountered:
Hi! @pombredanne I would like to contribute towards this issue as part of Google GSoC program 2023. I am going through the entire code repository. I have fair knowledge about python and I am well versed with data structures and algorithms. Can you help me out in proceeding with the issue?
We have at times some portability issues as ScanCode depends on a few performance-critical libraries that use native C code. These require specific pre-built wheels or a toolchain that may not be available on a user's machine.
We could maximize portability by using fallback pure-python libraries for these libraries. For basic features in scancode-toolkit-mini, these would be pyahocorasick and intbitset. Both are libraries that we maintain.
We should have a degraded, not-as-fast but pure Python variants of these libraries so that we can install without these native dependencies, e.g. some scancode-toolkit variant that would be pure Python. This would help also towards running in the browser.
key libraries
For lxml there are common idioms such as:
The text was updated successfully, but these errors were encountered: