Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test errors in generate_code are malformed #63

Open
dustinbyrne opened this issue Sep 22, 2024 · 2 comments
Open

Test errors in generate_code are malformed #63

dustinbyrne opened this issue Sep 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@dustinbyrne
Copy link
Contributor

dustinbyrne commented Sep 22, 2024

To find examples of this issue, look for instances of <test-errors> within generate.input.txt logs. It would appear that generate is called with malformed test_errors.

The trajectories confirm the test errors returned from summarize_test_errors are valid.

E.g.:

## Preventing test errors

Generate code that avoids the following test errors:

<test-errors>
1
e
O
.
x
)
n
y
m
S
3
g
E
u
z
r
b
q
w
t
a
P
D
 
`
M
4
G
/
=
#
o
>
T
,
6
l
U
N
L
5
s
0
%
h
:
F
c
I
(
A
k
]
2
"
j
'
v
[
p
B
7
C
<
i
8
9
f


d
_
</test-errors>
@dustinbyrne dustinbyrne added the bug Something isn't working label Sep 22, 2024
Copy link

github-actions bot commented Sep 22, 2024

Title

Correct malformed test_errors in generate_code calls

Problem

The function generate_code is being called with malformed test_errors, causing the <test-errors> section in the generated input text (generate.input.txt) to be filled with incoherent characters and symbols.

Analysis

The issue lies in the way test_errors is being passed around and utilized in the code base. The malformed output suggests that either:

  1. The test_errors is garbled when being generated or formatted, or
  2. There is an issue in how test_errors are collected or aggregated from different sources before being passed to generate_code.

Given that the trajectories confirm the test errors returned from summarize_test_errors are valid, the problem likely exists somewhere after the summarize_test_errors function is called and before the generate_code function is invoked with these errors.

Proposed Changes

  1. solver/workflow/summarize_test_errors.py: Ensure summarize_test_errors is returning a properly formatted list of error strings by logging its output for verification.

  2. solver/workflow/generate_and_validate_code.py: Modify the code that collects test_errors to ensure it aggregates the errors correctly. Add logging to verify the content of the test_errors set before passing it to generate_code.

  3. solver/workflow/solve_code.py: Ensure that the test_errors parameter passed to generate_code is properly formatted. Add verification to check the integrity and format of test_errors.

Here are the suggested changes for each file component:

  1. solver/workflow/summarize_test_errors.py:

    • Verify the output of summarize_test_errors function to ensure it returns a correctly formatted string.
    • Add debugging logs to inspect the intermediate variables that play a role in formatting output.
  2. solver/workflow/generate_and_validate_code.py:

    • In the generate_and_validate_code function, ensure test_errors are aggregated correctly.
    • Add logging statements to inspect the collected test_errors before they are passed to the generate_code.

    Here is the section to focus on from:

    accumulator.extend(results)
    return accumulator
    
    test_errors = set()
    
    def collect_errors(work_dir: WorkDir, run_test_result: RunTestResult):
        if run_test_result.test_status == TestStatus.ERROR:
            if run_test_result.test_output:
                test_errors.update(
                    summarize_test_errors(work_dir, run_test_result.test_output)
                )
  3. solver/workflow/solve_code.py:

    • Review and update the generate_code method to ensure test_errors is properly formatted and evaluated before being utilized.
    • Add a check to ensure the test_errors list is well-formed before it is used in generating the code.

    Example section of code:

    def generate_code(
        self, work_dir: WorkDir, plan: str, test_errors: List[str]
    ) -> Optional[Patch]:
        self.clean_git_state()
    
        generator = GenerateCode(
            self.log,
            work_dir,
            self.trajectory_file,
            plan,
            self.python_version,
            self.limits.file_limit,
        )
        def generate(attempt, lint_errors: List[str]):
            code = generator.generate(attempt, lint_errors, test_errors)
            # Add logging to inspect test_errors here
            self.log("debug", f"Test errors passed to code generation: {test_errors}")
            return generator.apply(attempt, code)

By verifying and correcting the test_errors at these points, you should be able to ensure that the generate_code calls receive properly formatted error messages, resulting in correctly generated input text.

@dustinbyrne dustinbyrne added bug Something isn't working and removed bug Something isn't working labels Sep 22, 2024
@kgilpin
Copy link
Contributor

kgilpin commented Sep 23, 2024

I haven't been able to reproduce this :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants