-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch files recursively #10
Conversation
@stevethomas just a consideration, could we perhaps change it so that it does recursive only when wildcard is passed Something along the lines of
What do you think? It does give more granular control. |
Hi @stevethomas This is an intentional feature. I wanted to give full control to the user on what specific directories and files to index, mainly to eliminate "clumsy" indexing/genuine mistakes from a developer that could inadvertently expose any or all of the following:
Every index is shared with two third parties. Obviously, it is the user's responsibility to ensure nothing unintentional is shared to OpenAI/Pinecone and not mine. However, if such details are still shared, even with explicit files and directories passed as is the current logic, then that is most certainly the fault of the user. If you use Laragenie on many large projects, you would always want to mitigate as many unintentional indexes as possible. On some level I agree, adding lots of directories repetitively to the array in the config can be slightly tedious, but it gives you and other developers a lot more clarity on what is shared. I am open to some sort of recursion, but for now I will close this PR as it conflicts with the intended implementation. You are welcome to start a new PR if you have figured out a reasonable alternative. Thanks, |
Yes I agree the wildcard communicates the recursiveness more clearly. |
I agree in principle that developers should definitely be thinking about any sensitive code they may be sharing with another 2 external parties, on the other hand it would be poorly written code that was hardcoding secrets anywhere other than .env in a Laravel context, and obviously we should not index that. Would you accept the |
Yes I agree it would be poorly written, but also wanted to mitigate as much as possible in the first run of this package. Open to @zbora23's suggestion and happy for either of you to work out a solution that has no negative effect on the current functionality. |
I'm not sure if this was a deliberate decision, but I think it makes sense for files to be retrieved recursively, because in many scenarios the files in the source code are split across lots of directories and specifying all of them is hard work and requires a lot more upkeep.
If you didn't want this to be the default, another option could be to have a
recursive
config option which defaults to false.Unrelated but the other thing I noticed immediately is that the OpenAI API seems to regularly flake out with "Service Unavailable" exceptions. Not sure if I just got unlucky with my timing or if this is normal; if it is normal perhaps aretry()
could help to avoid having to restart a large indexing operation from scratch (assuming one of retries is successful)Nevermind, I the OpenAI API was having issues yesterday