feat: Add PDF loader to document loaders #25

Tachikoma000 · 2024-09-18T22:18:46Z

Implement PdfLoader struct in src/document_loaders/pdf.rs
Add PdfLoader to document_loaders module
Implement DocumentLoader trait for PdfLoader
Use lopdf crate for PDF parsing
Add error handling for file operations and PDF parsing
Update Cargo.toml with lopdf dependency
Add unit tests for PdfLoader
Update documentation with PdfLoader usage examples

- Implement PdfLoader struct in src/document_loaders/pdf.rs - Add PdfLoader to document_loaders module - Implement DocumentLoader trait for PdfLoader - Use lopdf crate for PDF parsing - Add error handling for file operations and PDF parsing - Update Cargo.toml with lopdf dependency - Add unit tests for PdfLoader - Update documentation with PdfLoader usage examples

Tachikoma000 · 2024-09-18T22:22:02Z

Add PDF Loader to Document Loaders

This PR implements a PDF loader as part of the document loaders module in Rig. It allows users to easily load and process PDF documents for use in RAG systems and other NLP tasks.

Changes

Implemented PdfLoader struct in src/document_loaders/pdf.rs
Added PdfLoader to the document_loaders module
Implemented DocumentLoader trait for PdfLoader
Used the lopdf crate for PDF parsing
Added error handling for file operations and PDF parsing
Updated Cargo.toml with the lopdf dependency
Added unit tests for PdfLoader
Updated documentation with PdfLoader usage examples

Implementation Details

The PdfLoader uses the lopdf crate to parse PDF files and extract text content. It handles potential errors such as file not found or parsing errors. The extracted text is converted into DocumentEmbeddings for further processing in Rig.

Testing

Unit tests have been added to ensure the PdfLoader correctly loads PDF files and handles various edge cases. The tests cover:

Loading a valid PDF file
Handling a non-existent file
Processing a PDF with multiple pages
Dealing with empty PDF files

Documentation

The main documentation has been updated to include usage examples for the PdfLoader. This includes how to initialize the loader and integrate it with the EmbeddingsBuilder.

Related Issue

Tachikoma000 requested a review from cvauclair September 18, 2024 22:18

Tachikoma000 closed this Sep 18, 2024

Tachikoma000 deleted the PDF_Loader branch September 18, 2024 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add PDF loader to document loaders #25

feat: Add PDF loader to document loaders #25

Tachikoma000 commented Sep 18, 2024

Tachikoma000 commented Sep 18, 2024 •

edited

Loading

feat: Add PDF loader to document loaders #25

feat: Add PDF loader to document loaders #25

Conversation

Tachikoma000 commented Sep 18, 2024

Tachikoma000 commented Sep 18, 2024 • edited Loading

Add PDF Loader to Document Loaders

Changes

Implementation Details

Testing

Documentation

Related Issue

Tachikoma000 commented Sep 18, 2024 •

edited

Loading