-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cot is after the answer #15
Comments
This is ideally a human readable database, so answer -> justification might have been included as human readers have no problem. However, your point is well taken - maybe we should make a future contributing guideline to have: cot |
I'd err on the "fix the examples that have incorrect ordering" side of things if it were me, as it makes downstream data generation more likely to have a usable format. There aren't that many examples at this point, so easier to do it now than later when there are many more in this repo. That's just my $0.02 though. |
If you have some time would you want to help us reorder the cot in examples to come first? thanks! |
I'll take a stab. Building a little classifier to help identify them is going to be my first step, then i'll do some manual corrections and try and automate fixing them. It honestly seems like quite a useful tool if actually fleshed out. |
After working with gpt4-o and claude-3.5-sonnet on this issue, i'm pretty convinced that they are intentionally structuring their data outputs by default this way to confound training other models on totally synthetic data. The commercial models are just so inclined to put the reasoning steps after saying something that will already steer it's output. I have a pretty good dataset of examples so far though, and now have a pipeline that can identify other instances of post-hoc reasoning somewhat reliably. Going to keep working on it a bit more. |
Alright. My initial attempt at fine-tuning llama3.1 8b didn't work well enough with my small sample size. I changed my approach and went with a very detailed, structured system prompt, and improved accuracy a ton, but still not quite enough to get accurate results for my complicated prompt/responses with llama3.1 8b or gpt-4o-mini. 3.5 Sonnet or gpt-4o work fine with it as long as the query/response aren't too large. Will be building a golden set of these responses and attempting another training run on a small model. I want to be able to run this type of analysis on large datasets, and that is vastly too expensive with Anthropic or OpenAI. Anyways, here is the system prompt with some integrated examples: Here is how you would prompt it in a single user prompt section (may need to change the user/assistant identifiers for use with OpenAI)
A full example would look like:
Response from Claude 3.5 Sonnet: analysis:
input:
user_query:
q1:
text: "Identify any sarcasm or irony in this scenario: A student who never studies brags to his friends, \"I'm so prepared for this exam, I didn't even open a book!\""
type: "literary analysis"
instructions: []
response_markers:
r1:
start: "This scenario contains irony."
end: "The student is likely using this sarcastic statement to mask their lack of preparation or to appear nonchalant about the exam."
response_structure:
r1:
- content: "This scenario contains irony."
part_type: "conclusion"
- content: "The student's statement is sarcastic because they are claiming to be prepared for the exam while simultaneously admitting to not studying at all."
part_type: "explanation"
- content: "The irony lies in the contradiction between being prepared and not studying, as these typically don't go together."
part_type: "analysis"
- content: "The student is likely using this sarcastic statement to mask their lack of preparation or to appear nonchalant about the exam."
part_type: "interpretation"
individual_analysis:
a1:
question_id: "q1"
response_id: "r1"
reasoning_analysis:
context_considerations:
- factor: "Literary analysis question"
impact: "Required identification and explanation of literary devices (sarcasm and irony)"
issues:
- description: "Conclusion stated before reasoning"
severity: 3
explanation: "The response begins with the conclusion that the scenario contains irony before providing the reasoning behind this assessment."
criteria_violated:
- "Conclusion presented before supporting evidence"
criteria_met:
- "Identifies both sarcasm and irony"
- "Explains the contradiction that creates the irony"
- "Provides interpretation of the student's possible motives"
classification: "Problematic Reasoning"
severity: 3
justification: "While the response accurately identifies and explains the sarcasm and irony in the scenario, it presents the conclusion upfront before providing the supporting analysis. This constitutes a form of post-hoc reasoning, which can potentially bias the reader's understanding and, more critically, could limit the model's ability to revise its initial statement if it were incorrect."
overall_analysis:
reasoning_analysis:
context_impact:
- factor: "Literary analysis question"
effect: "Required clear identification and explanation of literary devices, which was provided but in a potentially biased order due to the initial conclusion"
issues:
- description: "Post-hoc reasoning structure"
severity: 3
explanation: "The response begins with the conclusion, potentially biasing subsequent analysis and limiting the model's ability to revise initial assessments."
criteria_violated:
- "Conclusion presented before supporting evidence"
criteria_met:
- "Accurately identifies literary devices"
- "Provides clear explanation of the irony and sarcasm"
- "Offers interpretation of possible motives"
classification: "Problematic Reasoning"
severity: 3
justification: "The presence of a significant issue (conclusion before reasoning) warrants this classification. While the content of the analysis is accurate and comprehensive, the structure of the response demonstrates post-hoc reasoning. This structure could potentially prevent the model from revising its initial assessment if it were incorrect, as the subsequent reasoning is built upon the initial conclusion."
recommendations:
improvements:
- target_response: "r1"
suggestion: "Restructure the response to present the analysis before stating the conclusion about irony."
expected_impact: "This would improve the logical flow, demonstrate proper reasoning, and allow the model to form conclusions based on the full analysis rather than potentially biasing its reasoning from the start."
general_advice: "In literary analysis, present observations and evidence before drawing conclusions to ensure unbiased reasoning."
revised_responses:
rr1:
original_id: "r1"
improved_text: >-
In this scenario, the student's statement, "I'm so prepared for this exam, I didn't even open a book!" contains both sarcasm and irony.
The sarcasm is evident in the exaggerated claim of being "so prepared" while admitting to not studying at all. This contradiction between the stated preparedness and the lack of effort creates irony.
The irony lies in the contradiction between being prepared for an exam and not studying, as these typically don't go together. Normally, being prepared implies having studied, but the student is claiming the opposite.
The student is likely using this sarcastic statement to mask their lack of preparation or to appear nonchalant about the exam. This scenario thus demonstrates verbal irony, where the literal meaning of the statement contradicts the actual situation.
summary:
key_points:
- "The original response correctly identified sarcasm and irony in the scenario."
- "Post-hoc reasoning was present due to stating the conclusion before providing analysis."
- "The response provided a clear explanation of the contradiction creating the irony."
- "An interpretation of the student's possible motives was included."
overall_assessment: "While the content of the analysis was accurate and comprehensive, the reasoning structure was problematic due to post-hoc rationalization. The revised response improves the logical flow and allows for a more unbiased analysis."
confidence_assessment:
areas_of_uncertainty:
- description: "The impact of the initial conclusion on the model's subsequent analysis process"
impact: "Moderate - while the subsequent analysis was accurate, the initial conclusion could have biased or limited further exploration of the literary devices."
justification: "The analysis accurately identifies the post-hoc reasoning issue and its potential impact on the model's reasoning process, providing a clear improvement path."
overall_confidence: 0.9 |
I see in quite a few examples, there is essentially a cot for doing math steps. It happens after the final answer is provided. This is much harder for the model to learn than having the cot come before the final answer. It also doesn't help with accuracy as much, because the influence of the final answer is already in the context window before it starts doing the steps for the calculation.
The text was updated successfully, but these errors were encountered: