-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fs 102/create report agent #34
Conversation
… the same way. We may want to think about creating some kind of 'SystemAgent' class that doesn't use invokve with an utternace in the future. Ignore an annoying pyright error that is a non-issue
…emove the example output which was restricting the prompts flexability
…nt getter model of the other agents. updated prompts to more accurately describe relationships and updated generate cypher query to be less restricted through copying examples which was causing bad cypher queries to be generated. added one BDD test based on the new dataset
…tfoo to have reusable prompt generator for all promptfoo tests.
# Conflicts: # .env.example # backend/src/director.py # backend/src/prompts/templates/generate-knowledge-graph-model.j2 # backend/src/utils/dynamic_knowledge_graph.py
prompts: file://promptfoo_test_runner.py:create_prompt | ||
|
||
tests: | ||
- description: "test model prompt references all csv headers in result using valid json format" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add an assert
for this test to check the output against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly used this to test that our LLMs were giving sensible responses. I am not entirely sure if there's anything more I can test at this point until we learn more about the ESG capabilities.
I suppose you could theoretically check that the output is formatted as markdown, but because almost anything is valid markdown, I am not sure how I would implement that.
Happy to hear any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, not much to test here. In that case, I'd just update the test description to description: sample test for prompt development
or something along those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
async def invoke(self, utterance: str) -> str: | ||
user_prompt = engine.load_prompt( | ||
"create-esg-report-user-prompt", | ||
document_text=utterance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit funny that we need to pass in the file via utterance
and we need to annotate with @agent
, our base Agent class is purpose built around the core "chat" style functionality of InferESG.
This is Something for another ticket entirely - but it feels like we need a new concept here, possibly creating a SystemAgent
or ReportAgent
base class that makes a cleaner separation from the Agent
class - or possibly work the other way and change Agent
class as it is now into ChatAgent
.
What you've built here looks good and matches the designs we agreed on for this ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried renaming the variable to something more sensible, but the linter didn't seem to like it.
Can we hook this up to the API and report director and test with swagger? http://localhost:8250/docs |
Ah, we are still waiting for Mike's ticket to go in to get the report director changes |
user_prompt_template: "create-esg-report-user-prompt" | ||
system_prompt_template: "create-esg-report-system-prompt" | ||
user_prompt_args: | ||
document_text: "Published September 2024 Carbon Reduction Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running your prompt with the kingfisher business report (https://scottlogic.atlassian.net/wiki/spaces/FS/pages/4422729729/Kingfisher+Responsible+Business+Report) it cut off the response after the 2nd bullet point and social and didn't even get to Governance. The response is probably too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it completely doesn't work for the McDonald's impact report (https://scottlogic.atlassian.net/wiki/spaces/FS/pages/4422696964/McDonald+s+Impact+Report+2023+24)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run this, I am seeing it work for mcdonalds without any changes required. Not sure what's different - were you using the large mistral model when you tested?
|
||
prompts: file://promptfoo_test_runner.py:create_prompt | ||
|
||
tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a few basic tests
for example:
- has the 3 headings "Environmental", "Social" and "Governance"
- for the AWS report the percentages should be easy enough to test since it's 100% environmental
- another test for another report eg kingfisher
- any content that might be wanting to be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added tests for the first bullet point.
For testing the specific content we expect to see in the report, I would lean towards not locking ourselves into expecting anything particular at the moment. Once this is in we need to take stock of where we are, what we are seeing and present it to Arbdn to get feedback on the direction. Trying to capture specifics in these reports via tests is hard because this could change as we learn more from Arbdn
backend/src/utils/config.py
Outdated
@@ -61,6 +63,7 @@ def load_env(self): | |||
self.files_directory = os.getenv("FILES_DIRECTORY", default_files_directory) | |||
self.answer_agent_llm = os.getenv("ANSWER_AGENT_LLM") | |||
self.intent_agent_llm = os.getenv("INTENT_AGENT_LLM") | |||
self.report_agent_llm = os.getenv("ESG_REPORT_AGENT_LLM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivan to fix this (remove ESG_
)
Description
https://scottlogic.atlassian.net.mcas.ms/browse/FS-102
Changelog
report_director.py
flow