Problems were chosen randomly. A problem was dismissed and next one is chosen, if:
- problem's description contains images (can't give image to ChatGPT)
- problem has more downvotes than upvotes (problem has low quality)
- problem were added on September 2021 (unsure if it was published before or after ChatGPT's cut date)
- problem has medium difficulty (some medium problems are easy, some of them are hard)
- problem is an SQL problem
52 problems were chosen in total:
- 13 easy published before September 2021
- 13 hard published before September 2021
- 13 easy published after September 2021
- 13 hard published after September 2021
ChatGPT-4 was used in all cases.
All dialogues were following the same pattern:
- intro and problem statement, request for solution without code
- (optional) prompt to correct the solution if it's obviously suboptimal
- ask for working code in python, starting with given function declaration
- (optional) prompt to fix the code if it doesn't pass tests
All dialogues had strictly the same wording. See phrases.txt
for specific templates.
Results spreadsheet: https://docs.google.com/spreadsheets/d/1C2b9ai-DBD4AwgRNEJuiLaGPUvwNie_1e-0vb6N5ohc/edit?usp=sharing