-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed course metadata as contentfile #2050
base: main
Are you sure you want to change the base?
Conversation
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a new feature to embed course metadata as content files, which enhances the chat agent's ability to retrieve course information. The implementation involves generating an "about this course" document and storing it in the contentfiles collection. The changes seem well-structured and include necessary modifications to the vector search utilities and tests. However, due to the limit on the number of review comments, I am unable to provide specific feedback on the code. I recommend that the author review the changes carefully and ensure that they meet the requirements of the project.
Summary of Findings
Merge Readiness
The pull request appears to be well-structured and introduces a valuable feature. However, due to the limitations on the number of review comments, I am unable to provide specific feedback on the code. I recommend that the author review the changes carefully and ensure that they meet the requirements of the project. I am unable to approve the pull request in any circumstance, and that users should have others review and approve this code before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you with some comments!
|
||
def generate_metadata_document(serialized_resource): | ||
""" | ||
Generate a plaint-text info document to embed in the contentfile collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be replaced with a serializer that takes a learning resource? Something like ContentFileLearningResourceMetadataSerializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That might be a cleaner way to do this. will have a look
resource_vector_point_id = str(vector_point_id(readable_id)) | ||
ids.append(resource_vector_point_id) | ||
course_info_document = generate_metadata_document(doc) | ||
metadata.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like other than resource_vector_point_id, most of this can be added to generate_metadata_document (or the serializer that i think you should replace it with)
What are the relevant tickets?
Fixes (part 1 of) https://github.com/mitodl/hq/issues/6725
Description (What does it do?)
This PR adds embeddings for general course info (what shows up in the resource panel) in the contentfiles collection so that the chat agent can get that info from the contentfile vector endpoint directly.
How it works
Anytime we embed a new resource, we also generate an "about this course" document with all the course info and put that in the contentfiles collection. We can follow this same pattern for whatever else we might need to enrich the chat agent's response
How can this be tested?
python manage.py generate_embeddings --resource-ids <id>
http://open.odl.local:8063/api/v0/vector_content_files_search/?limit=10&resource_readable_id=course-v1%3AMITxT%2B14.73x&q=who%20offers%20this%20course?
Additional Context