-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Chunking Methods #45
Conversation
@russellb @abhi1092 Hi Russel and Abhi, I have made final changes to the chunk_document() method and it passes all CI-Tests. Could you review my PR? Thanks |
This pull request has merge conflicts that must be resolved before it can be |
The code moved to |
A couple of things @PalmPalm7 :
|
Thank you for the review and the comments @abhi1092 ! My Justification is there is I assume our most common use cases, as we discussed, is PDF into markdown formats, therefore default case should be markdown. Furthermore, by specifying the language param in RecursiveCharacterTextSplitter, it uses these following separators:
Instead of these separators.
This is valid concern. I have used the original chunk_document()'s logic, but it doesn't properly handle over context length chunk either. |
Gotcha. Let's create an issue for Let me test this and we can merge it. @PalmPalm7 could you resolve the the conflicts? But could you resolve the merge conflicts? |
Add policy document for using GitHub actions in workflows
Demonstrated new chunking methods in replace of RecursiveCharacterTextSplitter()
Updated PR link: #65
Notable updates: