Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Provide traceback and function source code to LLM advice model #48

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

PCain02
Copy link
Collaborator

@PCain02 PCain02 commented Nov 10, 2024

Pull Request

**1. feat: Provide traceback and function source code to LLM advice model

2. List the names of those who contributed to the project.
@PCain02

3. Link the issue the pull request is meant to fix/resolve.

4. Add all labels that apply. (e.g., documentation, ready-for-review)

  • enhancement
  • test
  • bug

5. Describe the contents and goal of the pull request.
The goal of this PR are to provide more information to the LLM advice model. This includes the traceback and the actual source code of the function/s that failed the test.

6. Will coverage be maintained/increased?
Coverage will decrease slightly but test cases were added to compensate for that. It still remains at about 61%.

7. What operating systems has this been tested on? How were these tests conducted?
Windows 10 so please test on macOS and Linux. The tests were ran by doing poetry run task test but also by prompting the advice model and looking at the responses to see if they are more valuable than before. Now the traceback and failing function source code are given to the LLM so the responses are a lot more relevant and should not try to correct the test cases.

8. Include a code block and/or screenshots displaying the functionality of your feature, if applicable/possible.

I know this is a lot to process so these figures should help with code comprehension.
image
image
image

This is an example output:
image

@PCain02 PCain02 added enhancement New feature or request test Test cases or test running labels Nov 10, 2024
@PCain02 PCain02 added the bug Something isn't working label Nov 10, 2024
@gkapfham
Copy link
Collaborator

Hi @PCain02 it looks like this branch now has conflicts that need to be resolved. Can you please investigate this issue when you have time?

@gkapfham
Copy link
Collaborator

Also, @PCain02 can you give an example of a command-line that can be run and a repository on which it can be run so that we can all quickly test this feature?

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 13, 2024

Hi @PCain02 it looks like this branch now has conflicts that need to be resolved. Can you please investigate this issue when you have time?

Oh thanks for pointing that out I can resolve those ASAP!

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 13, 2024

Also, @PCain02 can you give an example of a command-line that can be run and a repository on which it can be run so that we can all quickly test this feature?

Absolutely, the command I have been using is poetry run execexam . ./tests/test_question_one.py --report trace --report status --report failure --report code --report setup --advice-method apiserver --advice-model anthropic/claude-3-haiku-20240307 --advice-server https://execexamadviser.fly.dev/ --report advice --fancy --debug but the model and server can be subbed out for whatever you would like to test it with. The way the advice works the prompt remains the same no matter the model.

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 13, 2024

Oh also the toml version will need changed before merging. I am not sure what people decided on class about merge order.

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 13, 2024

PR #41 for the Windows spacing bug should get merged before this PR because I made this with that fix in mind.

CalebKendra
CalebKendra previously approved these changes Nov 14, 2024
Copy link
Collaborator

@CalebKendra CalebKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This adds lots of needed depth to the LLM generation

coverage.json Outdated Show resolved Hide resolved
rebekahrudd
rebekahrudd previously approved these changes Nov 15, 2024
Copy link
Collaborator

@rebekahrudd rebekahrudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works on my Linux computer.

Here's the output that I see:
image

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 16, 2024

Thank you! And when you guys ran poetry run task test it worked as well?

@gkapfham
Copy link
Collaborator

Hi @rebekahrudd thanks for running this on Linux, I appreciate your help. Do you think that the advice that the coding mentor provides is now more helpful?

@gkapfham
Copy link
Collaborator

Hello @PCain02, I know that it is not a specific feature of your tool, but I'm wondering whether or not the collected stack trace (that you are passing to the LLM) should actually be one of the options that you can specify through the use of --report stacktrace? It seems like this might be helpful information that a student would want, right? Please note that if we add this extra report, we then need to be clear about the fact that there is already a --report trace that might need to be renamed to --report testtrace.

With all of those points in mind, can you please provide --- as a part of the review of this PR and whether or not we decide to add this feature --- please give an example of the stack trace that your tool collects and passes along to the LLM? While I recognize that your tool does not currently (nor does it absolutely need to) display this information to the user. However, it would be helpful if we could see this information as it is passed to the LLM and this a major factor in the type of output that the LLM produces.

@gkapfham
Copy link
Collaborator

Hi @CalebKendra and @rebekahrudd and others who reviewed this PR, can you please respond to @PCain02 about whether or not running the test suite passes correctly on your operating system and setup? (It should, however, we've had some problems with running the tests in GitHub Actions being different than running the tests on developer workstations). Thanks!

@PCain02
Copy link
Collaborator Author

PCain02 commented Nov 21, 2024

Hello @PCain02, I know that it is not a specific feature of your tool, but I'm wondering whether or not the collected stack trace (that you are passing to the LLM) should actually be one of the options that you can specify through the use of --report stacktrace? It seems like this might be helpful information that a student would want, right? Please note that if we add this extra report, we then need to be clear about the fact that there is already a --report trace that might need to be renamed to --report testtrace.

With all of those points in mind, can you please provide --- as a part of the review of this PR and whether or not we decide to add this feature --- please give an example of the stack trace that your tool collects and passes along to the LLM? While I recognize that your tool does not currently (nor does it absolutely need to) display this information to the user. However, it would be helpful if we could see this information as it is passed to the LLM and this a major factor in the type of output that the LLM produces.

Hello! I really like that idea although I think some formatting would be required to make it useful to the student right now here is an example of what the traceback looks like when given to the LLM. It is a list of dictionaries that contain the values about each test that failed.
[{'test_path': 'tests/test_question_one.py::test_find_maximum_value', 'source_file': 'questions/question_one.py', 'tested_function': 'find_maximum_value', 'full_traceback': 'E AssertionError: Maximum positive value in matrix\n assert 9 == 0', 'error_type': 'AssertionError', 'error_message': 'Maximum positive value in matrix', 'stack_trace': [], 'variables': {}, 'assertion_detail': 'assert 9 == 0', 'expected_value': 0, 'actual_value': 9}, {'test_path': 'tests/test_question_one.py::test_find_average_value', 'source_file': 'questions/question_one.py', 'tested_function': 'find_average_value', 'full_traceback': 'E AssertionError: Average value in matrix with positive numbers\n assert 5.0 == 0.06111111111111111', 'error_type': 'AssertionError', 'error_message': 'Average value in matrix with positive numbers', 'stack_trace': [], 'variables': {}, 'assertion_detail': 'assert 5.0 == 0.06111111111111111', 'expected_value': 0.06111111111111111, 'actual_value': 5.0}]
In order for this to be usable information for a student I imagine giving the full_traceback would be the most helpful along with the function name. However it may not be understandable in this format on the user end so we could use f-string sto make it a phrase or phrases that could be understood by the user.

--
I would also like to note what the function aspect of the code is now being given to the LLM. It literally looks like this. A list of functions that are lists that each item is a line in the function as a string.
[['def find_maximum_value(matrix: List[List[int]]) -> Union[int, None]:', '"""Return the maximum value in the provided matrix."""', '# confirm that there is a value in the [0][0] position', 'if not matrix or not matrix[0]:', 'return None', 'maximum_value = matrix[0][0]', 'for row in matrix:', 'for value in row:', 'if value > maximum_value:', 'maximum_value = value', 'maximum_value = 0', 'return maximum_value'], ['def find_average_value(matrix: List[List[int]]) -> Union[float, None]:', '"""Find the average value in the provided matrix."""', '# check if the matrix is empty', 'if not matrix or not matrix[0]:', 'return None', '# initialize sum and count variables', 'total_sum = 10', 'count = 0', '# iterate over the matrix to calculate the sum and count the number of elements', 'for row in matrix:', 'for value in row:', 'total_sum += value', 'count += 100', '# calculate and return the average value', 'return total_sum / count']]

@CalebKendra
Copy link
Collaborator

Hi @CalebKendra and @rebekahrudd and others who reviewed this PR, can you please respond to @PCain02 about whether or not running the test suite passes correctly on your operating system and setup? (It should, however, we've had some problems with running the tests in GitHub Actions being different than running the tests on developer workstations). Thanks!

@PCain02 I just tested the pytest suite with Ubuntu and it passed.

@PCain02 PCain02 dismissed stale reviews from rebekahrudd and CalebKendra via e475565 November 21, 2024 21:55
@rebekahrudd
Copy link
Collaborator

Hi @rebekahrudd thanks for running this on Linux, I appreciate your help. Do you think that the advice that the coding mentor provides is now more helpful?

Yes this is much more helpful now!

@rebekahrudd
Copy link
Collaborator

Thank you! And when you guys ran poetry run task test it worked as well?

Yes! It worked when I ran the poetry run task test:
Screenshot from 2024-11-22 22-13-48

Copy link
Collaborator

@rebekahrudd rebekahrudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the output I see in Linux, looks good!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request test Test cases or test running
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants