-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make embedding generation task use correct run #2074
base: main
Are you sure you want to change the base?
Make embedding generation task use correct run #2074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @shanbady, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request addresses issue #6832 by modifying the embedding generation task to use the 'next run' of a learning resource instead of the latest published run when generating embeddings for content files. This ensures that the embedding process uses the most up-to-date content. The changes involve modifying the start_embed_resources
and embed_learning_resources_by_id
functions in vector_search/tasks.py
to use course.next_run
instead of filtering for the latest published run. A new test case, test_embedded_content_from_next_run
, is added to vector_search/tasks_test.py
to verify this behavior.
Highlights
- Content Embedding: The embedding generation task now uses the 'next run' of a learning resource for content files.
- Code Modification: The
start_embed_resources
andembed_learning_resources_by_id
functions invector_search/tasks.py
were modified to usecourse.next_run
. - Testing: A new test case,
test_embedded_content_from_next_run
, was added tovector_search/tasks_test.py
to verify the new behavior.
Changelog
- vector_search/tasks.py
- Modified
start_embed_resources
function (line 107) to usecourse.next_run
instead of filtering for the latest published run. - Modified
embed_learning_resources_by_id
function (line 191) to usecourse.next_run
instead of filtering for the latest published run.
- Modified
- vector_search/tasks_test.py
- Added a new test case,
test_embedded_content_from_next_run
(lines 226-261), to verify that content files are embedded from the next course run.
- Added a new test case,
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Did you know?
The concept of word embeddings gained prominence with the introduction of Word2Vec by Tomas Mikolov and his team at Google in 2013.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request modifies the embedding generation task to use the correct run (next run) for content files. The changes look good and the new test case validates the fix. I have a few minor suggestions.
Summary of Findings
Assessment
The pull request effectively addresses the issue of embedding content files from the correct course run. The code changes are straightforward and the addition of a test case enhances confidence in the fix. I recommend addressing the minor suggestions before merging, and that users should have others review and approve this code before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this won't work with OLL courses (they are all archived, so there is no "next" run). Looking at the sync_edx_course_files
function, it does this:
resource = run.learning_resource
if run != (
resource.next_run
or resource.runs.filter(published=True).order_by("-start_date").first()
):
continue
So the code should probably make sure that resource.next_run isn't false, and if it is then use the run with the latest start date.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/6832
Description (What does it do?)
This Pr makes it so that the embedding process for contentfiles pulls the contentfiles from the "next run" of the learning resource rather than the latest run.
How can this be tested?
python manage.py generate_embeddings --all