Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fs 102/create report agent #34

Merged
merged 34 commits into from
Dec 2, 2024
Merged

Fs 102/create report agent #34

merged 34 commits into from
Dec 2, 2024

Conversation

stevenhillcox-sl
Copy link
Collaborator

@stevenhillcox-sl stevenhillcox-sl commented Nov 25, 2024

Description

https://scottlogic.atlassian.net.mcas.ms/browse/FS-102

Changelog

  • Add a new report agent to form and execute prompts relating to ESG report generation
  • Add new user/system prompt templates for ESG report generation
  • Add report agent into report_director.py flow

Steven Hillcox and others added 21 commits November 22, 2024 11:09
… the same way. We may want to think about creating some kind of 'SystemAgent' class that doesn't use invokve with an utternace in the future. Ignore an annoying pyright error that is a non-issue
…emove the example output which was restricting the prompts flexability
…nt getter model of the other agents. updated prompts to more accurately describe relationships and updated generate cypher query to be less restricted through copying examples which was causing bad cypher queries to be generated. added one BDD test based on the new dataset
…tfoo to have reusable prompt generator for all promptfoo tests.
# Conflicts:
#	.env.example
#	backend/src/director.py
#	backend/src/prompts/templates/generate-knowledge-graph-model.j2
#	backend/src/utils/dynamic_knowledge_graph.py
prompts: file://promptfoo_test_runner.py:create_prompt

tests:
- description: "test model prompt references all csv headers in result using valid json format"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add an assert for this test to check the output against?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly used this to test that our LLMs were giving sensible responses. I am not entirely sure if there's anything more I can test at this point until we learn more about the ESG capabilities.
I suppose you could theoretically check that the output is formatted as markdown, but because almost anything is valid markdown, I am not sure how I would implement that.
Happy to hear any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, not much to test here. In that case, I'd just update the test description to description: sample test for prompt development or something along those lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

async def invoke(self, utterance: str) -> str:
user_prompt = engine.load_prompt(
"create-esg-report-user-prompt",
document_text=utterance)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit funny that we need to pass in the file via utterance and we need to annotate with @agent, our base Agent class is purpose built around the core "chat" style functionality of InferESG.

This is Something for another ticket entirely - but it feels like we need a new concept here, possibly creating a SystemAgent or ReportAgent base class that makes a cleaner separation from the Agent class - or possibly work the other way and change Agent class as it is now into ChatAgent.

What you've built here looks good and matches the designs we agreed on for this ticket

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried renaming the variable to something more sensible, but the linter didn't seem to like it.

@IMladjenovic
Copy link
Collaborator

Can we hook this up to the API and report director and test with swagger? http://localhost:8250/docs

@IMladjenovic
Copy link
Collaborator

Can we hook this up to the API and report director and test with swagger? http://localhost:8250/docs

Ah, we are still waiting for Mike's ticket to go in to get the report director changes

user_prompt_template: "create-esg-report-user-prompt"
system_prompt_template: "create-esg-report-system-prompt"
user_prompt_args:
document_text: "Published September 2024 Carbon Reduction Plan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running your prompt with the kingfisher business report (https://scottlogic.atlassian.net/wiki/spaces/FS/pages/4422729729/Kingfisher+Responsible+Business+Report) it cut off the response after the 2nd bullet point and social and didn't even get to Governance. The response is probably too long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it completely doesn't work for the McDonald's impact report (https://scottlogic.atlassian.net/wiki/spaces/FS/pages/4422696964/McDonald+s+Impact+Report+2023+24)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this, I am seeing it work for mcdonalds without any changes required. Not sure what's different - were you using the large mistral model when you tested?


prompts: file://promptfoo_test_runner.py:create_prompt

tests:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a few basic tests
for example:

  • has the 3 headings "Environmental", "Social" and "Governance"
  • for the AWS report the percentages should be easy enough to test since it's 100% environmental
  • another test for another report eg kingfisher
  • any content that might be wanting to be tested?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests for the first bullet point.

For testing the specific content we expect to see in the report, I would lean towards not locking ourselves into expecting anything particular at the moment. Once this is in we need to take stock of where we are, what we are seeing and present it to Arbdn to get feedback on the direction. Trying to capture specifics in these reports via tests is hard because this could change as we learn more from Arbdn

@@ -61,6 +63,7 @@ def load_env(self):
self.files_directory = os.getenv("FILES_DIRECTORY", default_files_directory)
self.answer_agent_llm = os.getenv("ANSWER_AGENT_LLM")
self.intent_agent_llm = os.getenv("INTENT_AGENT_LLM")
self.report_agent_llm = os.getenv("ESG_REPORT_AGENT_LLM")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivan to fix this (remove ESG_)

@IMladjenovic IMladjenovic merged commit df60207 into main Dec 2, 2024
4 checks passed
@IMladjenovic IMladjenovic deleted the FS-102/create-report-agent branch December 2, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants