tutorials/tweetbot_elo/config.yaml



llm_agents:

  prompt_writer:
    model: openai/gpt-4o-mini
    template: |
      Pretend that you are a professional content writer who writes good content. Write a detailed instruction for an AI content writer which instructs it create the content that the user is asking for, suited for the {{ platform }} platform.

      Make sure to instruct the AI to be specific, direct, and professional.

      Here is the user request: I want to {{ user_request }}

      Here is the abstract of the paper: {{ paper_abstract }}

      Keep your prompt SHORT.

      Respond ONLY with the final instruction and NOTHING else in your response.
    template_params:
      paper_abstract: |
        Abstract: This paper, titled 'LLM experiments with simulation: Large Language Model Multi-Agent System for Simulation Model Parametrization in Digital Twins', presents a novel design of a multi-agent system framework that applies large language models (LLMs) to automate the parametrization of simulation models in digital twins. This framework features specialized LLM agents tasked with observing, reasoning, decision-making, and summarizing, enabling them to dynamically interact with digital twin simulations to explore parametrization possibilities and determine feasible parameter settings to achieve an objective. The proposed approach enhances the usability of simulation model by infusing it with knowledge heuristics from LLM and enables autonomous search for feasible parametrization to solve a user task. Furthermore, the system has the potential to increase user-friendliness and reduce the cognitive load on human users by assisting in complex decision-making processes.


  content_agent: &content_agent
    model: openai/gpt-4o-mini
    template: |
      You are a developer relations manager at a company selling evaluation and observability tooling to AI application builders. You extract fun and practically useful information from technical content to engage your audience on how to build and test AI applications better.

      Remember the content you are writing about is not written by you. You may refer to the authors by using the pronouns they/them. 

      Make sure to mention why should an AI developer care about the content you are writing about. 

      You are writing a single post for {{ platform }}, so the post should be concise and engaging. 

      {{ platform_instructions }}

      Make sure sentences are concise and don't use flowery language
      It shouldn't sound too salesy or promotional
      Don't use too many adjectives or adverbs
      Don't make general claims
      Don't start with a question

      At the end of the post, make one single thought-provoking claim. Do NOT make more than one claim.

      Please respond with the post only and no other text before or after it.

      Here is an example of a good response in the right style:

      {{ example }}
    hyperparams:
      temperature: 1
      max_tokens: 1024

  twitter_content_agent:
    << : *content_agent
    template_params:
      platform: Twitter
      platform_instructions: |
        Make sure the tweet is long. 
        Make sure you have an interesting perspective about the topic, don't just summarize.
        Try to be creative, but make sure you are always being truthful.
      example: |
        Mixture-of-Agents as an Event-Driven System 🤖☎️
        This paper by @JunlinWang3 shows you how to ensemble smaller LLMs to create a system that can outperform state-of-the-art larger models.

        We've implemented this paper in a fully async, event-driven workflow system thanks to @ravithejads
        - treat each “small LLM” as an event-driven step that can process incoming events and respond to it, independently and in parallel.

        ✅ Take full advantage of processing an entire batch of requests
        ✅ Cleaner, readable code

  linkedin_content_agent:
    << : *content_agent
    template_params:
      platform: LinkedIn
      platform_instructions: |
        Make sure the tone is professional and engaging.
        Don't exaggerate any claims.
        Don't sound like a marketer or a salesperson. Sound like an intelligent investor.
      example: |
        Ever wonder why your LLM app aces your test suite but stumbles in production? You might be seeing dataset drift.

        Real-world usage is dynamic. User inputs evolve, model behavior changes (remember those unexpected OpenAI updates?), and user expectations shift. Meanwhile, our golden datasets often remain frozen in time. The result? "Dataset Drift" - where your test cases no longer represent real user queries.

        The solution? Build a data flywheel 🔄

        1️⃣ Continuously monitor and evaluate logs from production (using feedback, auto-evals, etc.)
        2️⃣ Use metrics and scores to filter your logs and find underperforming queries
        3️⃣ Add these edge-cases to your test bank and manually correct LLM outputs
        😎 Watch your test suite evolve with users

        We’re seeing teams significantly improve LLM reliability by adopting this approach. Remember: Both your datasets and eval criteria need to continuously evolve with real-world usage!

        Slides from my OSS4AI talk on this topic: https://lnkd.in/eHjhkkV9


evaluators:

  tweet_char_count:
    checker: numrange
    target: (0, 20]
    asserts: on

  llm_tweet_choice_judge:
    wraps:
      allm_pairwise_judge:
        repeat: 3
        aggregate: flatten
    
    criteria: |
      - Make sure sentences are concise and don't use flowery language.
      - It should say something interesting to an AI developer.
      - It should make a profound and true claim about AI.
      - It shouldn't sound too salesy or promotional.
      - It should be factually accurate.
      - It should be grammatically correct.
      - Don't use too many adjectives or adverbs.
      - Don't make general claims.
      - Don't start with a question.

    aggregate: elo_ratings