Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and Relocate Native Library Pattern Loading #320

Merged
merged 57 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
6e7e488
Added plugin list subcommand functionality
willis89pr Nov 4, 2024
45a1b6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 4, 2024
d141ce5
changed subcommand from list to display to avoid python builtin redef…
willis89pr Nov 4, 2024
75afc0c
changed subcommand from list to display to avoid python builtin redef…
willis89pr Nov 5, 2024
555490d
changed subcommand from list to display to avoid python builtin redef…
willis89pr Nov 5, 2024
dbd9fba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
79a3675
fixed import function
willis89pr Nov 5, 2024
229d8b1
fixed import function
willis89pr Nov 5, 2024
4126841
Merge branch 'main' into CYT-828-add-plugin-command
willis89pr Nov 5, 2024
be48bb5
changed subcommand name to list in click decorator
willis89pr Nov 5, 2024
efb07c6
Merge remote-tracking branch 'origin/CYT-828-add-plugin-command' into…
willis89pr Nov 5, 2024
3f911a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
261cc6e
Changed subcommand definition name. Added docstring description to su…
willis89pr Nov 5, 2024
fabf213
Merge remote-tracking branch 'origin/CYT-828-add-plugin-command' into…
willis89pr Nov 5, 2024
5e20458
Changed subcommand definition name. Added docstring description to su…
willis89pr Nov 5, 2024
b88b0e7
Added boiler plate for disable commmand.
willis89pr Nov 5, 2024
52e0703
Added disable plugin functionality.
willis89pr Nov 6, 2024
768574d
Added disable subcommand to main.
willis89pr Nov 6, 2024
3a8342c
Added variables for config section and key and changed section to core.
willis89pr Nov 6, 2024
da8554f
Re-implemented disable command with functionality in surfactant/plugi…
willis89pr Nov 6, 2024
be12bfa
Added print disabled plugins to list subcommand
willis89pr Nov 6, 2024
4d27807
Added print disabled plugins to list subcommand.
willis89pr Nov 6, 2024
62657de
Added plugin enable functionality.
willis89pr Nov 11, 2024
40babc5
Merge branch 'main' into CYT-1123-plugin-enable-disable
willis89pr Nov 11, 2024
7ac000a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2024
c0b4a0b
Update surfactant/cmd/plugin.py
willis89pr Nov 11, 2024
093b047
Update surfactant/cmd/plugin.py
willis89pr Nov 11, 2024
dd4b055
Save.
willis89pr Nov 12, 2024
0ec3b91
Added set_plugins function. Changed formatting of print_plugins
willis89pr Nov 12, 2024
03355a2
Added print_plugins in list command and changed formatting
willis89pr Nov 12, 2024
814d9aa
Merge branch 'CYT-1123-plugin-enable-disable' of https://github.com/L…
willis89pr Nov 12, 2024
c827e88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
6b78b8d
Update surfactant/plugin/manager.py
willis89pr Nov 12, 2024
4a9f744
Updating local
willis89pr Nov 12, 2024
3ce1d93
Merge branch 'main' into CYT-1123-plugin-enable-disable
willis89pr Nov 12, 2024
ddebfef
Added command declarations in main.
willis89pr Nov 12, 2024
b51f35c
Merged deleted branches.
willis89pr Nov 18, 2024
6707d3a
Merge branch 'main' of https://github.com/LLNL/Surfactant
willis89pr Nov 18, 2024
80bd3f7
Merge branch 'main' of https://github.com/LLNL/Surfactant
willis89pr Dec 11, 2024
f2182e7
Merge branch 'main' of https://github.com/LLNL/Surfactant
willis89pr Dec 18, 2024
1a5d7a4
Merge branch 'main' of https://github.com/LLNL/Surfactant
willis89pr Jan 6, 2025
9c901bf
Merge branch 'main' of https://github.com/LLNL/Surfactant
willis89pr Jan 13, 2025
ff95eab
Refactor native library pattern loading into a class
willis89pr Jan 13, 2025
4c910a8
Moved native_libraries.get_emba_db.py script to infoextractors.native…
willis89pr Jan 13, 2025
44fe268
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 13, 2025
51f82d7
Merge branch 'main' into init-nativelib
willis89pr Jan 13, 2025
26bedf9
Removed line that skipped updating if there was an error in parsing.
willis89pr Jan 13, 2025
1617656
fetch.
willis89pr Jan 13, 2025
3bfbeab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 13, 2025
a00ddfa
Added typing.
willis89pr Jan 14, 2025
82630c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 14, 2025
b8ccc56
Changed function name.
willis89pr Jan 14, 2025
024ff7d
Merge branch 'init-nativelib' of https://github.com/LLNL/Surfactant i…
willis89pr Jan 14, 2025
b645f85
Merge branch 'main' into init-nativelib
willis89pr Jan 14, 2025
b7884a4
Added check in parse function.
willis89pr Jan 14, 2025
c73ef3e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 14, 2025
74ad068
Update surfactant/infoextractors/native_lib_file.py
willis89pr Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 0 additions & 92 deletions scripts/native_libraries/get_emba_db.py

This file was deleted.

172 changes: 140 additions & 32 deletions surfactant/infoextractors/native_lib_file.py
Original file line number Diff line number Diff line change
@@ -1,73 +1,77 @@
import json
import os
import re
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Union

import requests
from loguru import logger

import surfactant.plugin
from surfactant.configmanager import ConfigManager
from surfactant.sbomtypes import SBOM, Software


@surfactant.plugin.hookimpl
def short_name() -> Optional[str]:
return "native_lib_patterns"
class NativeLibDatabaseManager:
def __init__(self) -> None:
self.native_lib_database: Optional[Dict[str, Any]] = None

def load_db(self) -> None:
native_lib_file = ConfigManager().get_data_dir_path() / "native_lib_patterns" / "emba.json"

def load_pattern_db():
# Load regex patterns into database var
try:
with open(native_lib_patterns, "r") as regex:
emba_patterns = json.load(regex)
return emba_patterns
except FileNotFoundError:
logger.warning(f"File not found for native library detection: {native_lib_patterns}")
return None
try:
with open(native_lib_file, "r") as regex:
self.native_lib_database = json.load(regex)
except FileNotFoundError:
logger.warning(
"Native library pattern could not be loaded. Run `surfactant plugin update-db native_lib_patterns` to fetch the pattern database."
)
self.native_lib_database = None

def get_database(self) -> Optional[Dict[str, Any]]:
return self.native_lib_database

# Load the pattern database once at module import
native_lib_patterns = ConfigManager().get_data_dir_path() / "native_lib_patterns" / "emba.json"
database = load_pattern_db()

native_lib_manager = NativeLibDatabaseManager()

def supports_file(filetype) -> bool:

def supports_file(filetype: str) -> bool:
return filetype in ("PE", "ELF", "MACHOFAT", "MACHOFAT64", "MACHO32", "MACHO64")


@surfactant.plugin.hookimpl
def extract_file_info(sbom: SBOM, software: Software, filename: str, filetype: str) -> object:
def extract_file_info(
sbom: SBOM, software: Software, filename: str, filetype: str
) -> Optional[Dict[str, Any]]:
if not supports_file(filetype):
return None
return extract_native_lib_info(filename)


def extract_native_lib_info(filename):
def extract_native_lib_info(filename: str) -> Optional[Dict[str, Any]]:
native_lib_info: Dict[str, Any] = {"nativeLibraries": []}
if not database:
native_lib_database = native_lib_manager.get_database()

if native_lib_database is None:
return None

found_libraries = set()
library_names = []
contains_library_names = []
found_libraries: set = set()
library_names: List[str] = []
contains_library_names: List[str] = []

# Match based on filename
base_filename = os.path.basename(filename)
filenames_list = match_by_attribute("filename", base_filename, database)
filenames_list = match_by_attribute("filename", base_filename, native_lib_database)
if len(filenames_list) > 0:
for match in filenames_list:
library_name = match["isLibrary"]
if library_name not in found_libraries:
library_names.append(library_name)
found_libraries.add(library_name)

# Match based on filecontent
try:
with open(filename, "rb") as native_file:
filecontent = native_file.read()
filecontent_list = match_by_attribute("filecontent", filecontent, database)
filecontent_list = match_by_attribute("filecontent", filecontent, native_lib_database)

# Extend the list and add the new libraries found
for match in filecontent_list:
library_name = match["containsLibrary"]
if library_name not in found_libraries:
Expand All @@ -77,19 +81,19 @@ def extract_native_lib_info(filename):
except FileNotFoundError:
logger.warning(f"File not found: {filename}")

# Create the single entry for isLibrary
if library_names:
native_lib_info["nativeLibraries"].append({"isLibrary": library_names})

# Create the single entry for containsLibrary
if contains_library_names:
native_lib_info["nativeLibraries"].append({"containsLibrary": contains_library_names})

return native_lib_info


def match_by_attribute(attribute: str, content: str, patterns_database: Dict) -> List[Dict]:
libs = []
def match_by_attribute(
attribute: str, content: Union[str, bytes], patterns_database: Dict[str, Any]
) -> List[Dict[str, Any]]:
libs: List[Dict[str, str]] = []
for lib_name, lib_info in patterns_database.items():
if attribute in lib_info:
for pattern in lib_info[attribute]:
Expand All @@ -102,3 +106,107 @@ def match_by_attribute(attribute: str, content: str, patterns_database: Dict) ->
if matches:
libs.append({"containsLibrary": lib_name})
return libs


def download_database() -> Optional[str]:
emba_database_url = "https://raw.githubusercontent.com/e-m-b-a/emba/11d6c281189c3a14fc56f243859b0bccccce8b9a/config/bin_version_strings.cfg"
response = requests.get(emba_database_url)
if response.status_code == 200:
logger.info("Request successful!")
return response.text

if response.status_code == 404:
logger.error("Resource not found.")
else:
logger.error("An error occurred.")

return None


def parse_emba_cfg_file(content: str) -> Dict[str, Dict[str, List[str]]]:
database: Dict[str, Dict[str, List[str]]] = {}
lines = content.splitlines()
filtered_lines: List[str] = []

for line in lines:
if not (line.startswith("#") or line.startswith("identifier")):
filtered_lines.append(line)

for line in filtered_lines:
line = line.strip()

fields = line.split(";")

lib_name = fields[0]

name_patterns: List[str] = []

if fields[3].startswith('"') and fields[3].endswith('""'):
filecontent = fields[3][1:-1]
elif fields[3].endswith('""'):
filecontent = fields[3][:-1]
else:
filecontent = fields[3].strip('"')

if fields[1] == "" or fields[1] == "strict":
if fields[1] == "strict":
if lib_name not in database:
database[lib_name] = {
"filename": [lib_name],
"filecontent": [],
}
willis89pr marked this conversation as resolved.
Show resolved Hide resolved
else:
if lib_name not in database[lib_name]["filename"]:
database[lib_name]["filename"].append(lib_name)
else:
try:
re.search(filecontent.encode("utf-8"), b"")
if lib_name not in database:
database[lib_name] = {
"filename": name_patterns,
"filecontent": [filecontent],
}
else:
database[lib_name]["filecontent"].append(filecontent)
except re.error as e:
logger.error(f"Error parsing file content regexp {filecontent}: {e}")

return database


@surfactant.plugin.hookimpl
def update_db() -> str:
file_content = download_database()
if file_content is not None:
parsed_data = parse_emba_cfg_file(file_content)
for _, value in parsed_data.items():
filecontent_list = value["filecontent"]

for i, pattern in enumerate(filecontent_list):
if pattern.startswith("^"):
filecontent_list[i] = pattern[1:]

if not pattern.endswith("\\$"):
if pattern.endswith("$"):
filecontent_list[i] = pattern[:-1]

path = ConfigManager().get_data_dir_path() / "native_lib_patterns"
path.mkdir(parents=True, exist_ok=True)
native_lib_file = ConfigManager().get_data_dir_path() / "native_lib_patterns" / "emba.json"
with open(native_lib_file, "w") as json_file:
json.dump(parsed_data, json_file, indent=4)
return "Update complete."
return "No update occurred."


@surfactant.plugin.hookimpl
def short_name() -> Optional[str]:
return "native_lib_patterns"


@surfactant.plugin.hookimpl
def init_hook(command_name: Optional[str] = None) -> None:
if command_name != "update-db":
logger.info("Initializing native_lib_file...")
native_lib_manager.load_db()
logger.info("Initializing native_lib_file complete.")
Loading