[WIP] Add ramalama rag command #501

rhatdan · 2024-12-03T20:08:08Z

Summary by Sourcery

Add a new 'rag' command to the RamaLama CLI for generating RAG data from documents and converting it into an OCI Image, along with corresponding documentation updates.

New Features:

Introduce a new 'rag' command in the RamaLama CLI to generate Retrieval Augmented Generation (RAG) data from various document formats and convert it into an OCI Image.

Documentation:

Add documentation for the new 'rag' command, detailing its usage, options, and description.

sourcery-ai · 2024-12-03T20:08:13Z

Reviewer's Guide by Sourcery

This PR adds a new 'rag' command to the RamaLama CLI that enables generating Retrieval Augmented Generation (RAG) data from various document formats and converting them into an OCI Image. The implementation includes a new command parser, core RAG functionality for document processing, and corresponding documentation.

Sequence diagram for the RAG command execution

sequenceDiagram
    actor User
    participant CLI as RamaLama CLI
    participant rag as rag
    participant Converter as DocumentConverter
    participant Result as ConversionResult

    User->>CLI: Execute 'ramalama rag PATH IMAGE'
    CLI->>rag: rag_cli(args)
    rag->>Converter: convert_all(targets)
    Converter->>Result: ConversionResult
    rag->>Result: export_documents(conv_results, output_dir)
    Result-->>rag: Success/Failure counts
    rag-->>CLI: Processed results
    CLI-->>User: Display results

Class diagram for the new RAG functionality

classDiagram
    class DocumentConverter {
        +convert_all(targets, raises_on_error)
    }

    class ConversionResult {
        +input: File
        +status: ConversionStatus
        +document: Document
        +errors: List<Error>
    }

    class ConversionStatus {
        <<enumeration>>
        SUCCESS
        PARTIAL_SUCCESS
        FAILURE
    }

    class Document {
        +export_to_dict()
        +export_to_document_tokens()
        +export_to_markdown(strict_text)
    }

    class rag {
        +generate(args)
        +walk(path)
        +export_documents(conv_results, output_dir)
    }

    DocumentConverter --> ConversionResult
    ConversionResult --> ConversionStatus
    ConversionResult --> Document
    rag --> DocumentConverter
    rag --> ConversionResult
    rag --> ConversionStatus
    rag --> Document

File-Level Changes

Change	Details	Files
Added new RAG command to CLI interface	Added rag_parser function to register the new command Added rag_cli function as the command handler Updated command list to include the new RAG command Added command-line arguments for input paths and output image name	`ramalama/cli.py`
Implemented core RAG functionality for document processing	Created document walking function to recursively find target files Implemented document conversion using DocumentConverter Added export functionality for multiple formats (JSON, YAML, doctags, markdown, text) Added error handling and conversion status reporting	`ramalama/rag.py`
Added documentation for the new RAG command	Created man page for ramalama-rag command Updated main ramalama documentation to include RAG command Fixed capitalization consistency in info command help text	`docs/ramalama.1.md` `docs/ramalama-rag.1.md` `docs/ramalama-info.1.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @rhatdan - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider improving error message clarity in rag.py - 'The example failed converting X on Y' is confusing. Consider something like 'Failed to convert X out of Y documents'.
Help text capitalization is inconsistent across commands (e.g. 'display' vs 'Display'). Consider standardizing on sentence case for all help messages.

Here's what I looked at during the review

🟡 General issues: 2 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-12-03T20:09:08Z

ramalama/rag.py

+_log = logging.getLogger(__name__)
+
+
+def walk(path):


issue: The walk() function should consistently return a list in all cases

Currently returns None for empty directories but a list otherwise. This inconsistency could cause runtime errors. Consider returning an empty list for empty directories and removing the early return.

sourcery-ai · 2024-12-03T20:09:08Z

ramalama/rag.py

+
+    converter = DocumentConverter()
+    conv_results = converter.convert_all(targets, raises_on_error=False)
+    success_count, partial_success_count, failure_count = export_documents(conv_results, output_dir=Path("scratch"))


suggestion: Avoid hardcoding the scratch directory path

Consider making this configurable or using a constant defined at module level

OUTPUT_DIR = Path("scratch") # At module level, near top of file success_count, partial_success_count, failure_count = export_documents(conv_results, output_dir=OUTPUT_DIR)

sourcery-ai · 2024-12-03T20:09:08Z

ramalama/rag.py

+    return targets
+
+
+def export_documents(


issue (complexity): Consider using a format mapping dictionary to handle document exports

The export_documents function contains significant repetition that can be simplified without losing clarity. Consider restructuring using a format mapping:

EXPORT_FORMATS = { '.json': lambda doc: json.dumps(doc.export_to_dict()), '.yaml': lambda doc: yaml.safe_dump(doc.export_to_dict()), '.doctags.txt': lambda doc: doc.export_to_document_tokens(), '.md': lambda doc: doc.export_to_markdown(), '.txt': lambda doc: doc.export_to_markdown(strict_text=True) } def export_documents(conv_results: Iterable[ConversionResult], output_dir: Path): output_dir.mkdir(parents=True, exist_ok=True) stats = {'success': 0, 'partial': 0, 'failed': 0} for conv_res in conv_results: if conv_res.status == ConversionStatus.SUCCESS: stats['success'] += 1 doc_filename = conv_res.input.file.stem for ext, export_fn in EXPORT_FORMATS.items(): output_file = output_dir / f"{doc_filename}{ext}" with output_file.open('w') as fp: fp.write(export_fn(conv_res.document)) elif conv_res.status == ConversionStatus.PARTIAL_SUCCESS: stats['partial'] += 1 _log.info(f"Document {conv_res.input.file} was partially converted with the following errors:") for item in conv_res.errors: _log.info(f"\t{item.error_message}") else: stats['failed'] += 1 _log.info(f"Document {conv_res.input.file} failed to convert.") _log.info(f"Processed {sum(stats.values())} docs, of which {stats['failed']} failed " f"and {stats['partial']} were partially converted.") return stats['success'], stats['partial'], stats['failed']

This approach:

Separates format configuration from export logic

Reduces code duplication

Makes adding new export formats easier

Maintains explicit error handling and logging

rhatdan · 2024-12-06T21:21:19Z

@ericcurtin any idea why docling is not being found on Mac? Seems to be found on my MAC?

Allow users to specify Docx, PDF, Markdown ... files on the command line and then processes them with docling and rag finally putting the output into the specified container image. Signed-off-by: Daniel J Walsh <[email protected]>

sourcery-ai bot reviewed Dec 3, 2024

View reviewed changes

rhatdan force-pushed the rag branch 7 times, most recently from ff9f7fa to 97a74c1 Compare December 6, 2024 21:17

rhatdan force-pushed the rag branch from 97a74c1 to 92a4927 Compare December 9, 2024 20:31

Add ramalama rag command

aebdc8e

Allow users to specify Docx, PDF, Markdown ... files on the command line and then processes them with docling and rag finally putting the output into the specified container image. Signed-off-by: Daniel J Walsh <[email protected]>

rhatdan force-pushed the rag branch from 92a4927 to aebdc8e Compare December 10, 2024 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add ramalama rag command #501

[WIP] Add ramalama rag command #501

rhatdan commented Dec 3, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 3, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Dec 3, 2024

sourcery-ai bot Dec 3, 2024

sourcery-ai bot Dec 3, 2024

rhatdan commented Dec 6, 2024

[WIP] Add ramalama rag command #501

Are you sure you want to change the base?

[WIP] Add ramalama rag command #501

Conversation

rhatdan commented Dec 3, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Dec 3, 2024 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for the RAG command execution

Class diagram for the new RAG functionality

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Dec 3, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 3, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 3, 2024

Choose a reason for hiding this comment

rhatdan commented Dec 6, 2024

rhatdan commented Dec 3, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 3, 2024 •

edited

Loading