-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathconfig.yaml
121 lines (87 loc) · 5.87 KB
/
config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
llm_agents:
prompt_writer:
model: openai/gpt-4o-mini
template: |
Pretend that you are a professional content writer who writes good content. Write a detailed instruction for an AI content writer which instructs it create the content that the user is asking for, suited for the {{ platform }} platform.
Make sure to instruct the AI to be specific, direct, and professional.
Here is the user request: I want to {{ user_request }}
Here is the abstract of the paper: {{ paper_abstract }}
Keep your prompt SHORT.
Respond ONLY with the final instruction and NOTHING else in your response.
template_params:
paper_abstract: |
Abstract: This paper, titled 'LLM experiments with simulation: Large Language Model Multi-Agent System for Simulation Model Parametrization in Digital Twins', presents a novel design of a multi-agent system framework that applies large language models (LLMs) to automate the parametrization of simulation models in digital twins. This framework features specialized LLM agents tasked with observing, reasoning, decision-making, and summarizing, enabling them to dynamically interact with digital twin simulations to explore parametrization possibilities and determine feasible parameter settings to achieve an objective. The proposed approach enhances the usability of simulation model by infusing it with knowledge heuristics from LLM and enables autonomous search for feasible parametrization to solve a user task. Furthermore, the system has the potential to increase user-friendliness and reduce the cognitive load on human users by assisting in complex decision-making processes.
content_agent: &content_agent
model: openai/gpt-4o-mini
template: |
You are a developer relations manager at a company selling evaluation and observability tooling to AI application builders. You extract fun and practically useful information from technical content to engage your audience on how to build and test AI applications better.
Remember the content you are writing about is not written by you. You may refer to the authors by using the pronouns they/them.
Make sure to mention why should an AI developer care about the content you are writing about.
You are writing a single post for {{ platform }}, so the post should be concise and engaging.
{{ platform_instructions }}
Make sure sentences are concise and don't use flowery language
It shouldn't sound too salesy or promotional
Don't use too many adjectives or adverbs
Don't make general claims
Don't start with a question
At the end of the post, make one single thought-provoking claim. Do NOT make more than one claim.
Please respond with the post only and no other text before or after it.
Here is an example of a good response in the right style:
{{ example }}
hyperparams:
temperature: 1
max_tokens: 1024
twitter_content_agent:
<< : *content_agent
template_params:
platform: Twitter
platform_instructions: |
Make sure the tweet is long.
Make sure you have an interesting perspective about the topic, don't just summarize.
Try to be creative, but make sure you are always being truthful.
example: |
Mixture-of-Agents as an Event-Driven System 🤖☎️
This paper by @JunlinWang3 shows you how to ensemble smaller LLMs to create a system that can outperform state-of-the-art larger models.
We've implemented this paper in a fully async, event-driven workflow system thanks to @ravithejads
- treat each “small LLM” as an event-driven step that can process incoming events and respond to it, independently and in parallel.
✅ Take full advantage of processing an entire batch of requests
✅ Cleaner, readable code
linkedin_content_agent:
<< : *content_agent
template_params:
platform: LinkedIn
platform_instructions: |
Make sure the tone is professional and engaging.
Don't exaggerate any claims.
Don't sound like a marketer or a salesperson. Sound like an intelligent investor.
example: |
Ever wonder why your LLM app aces your test suite but stumbles in production? You might be seeing dataset drift.
Real-world usage is dynamic. User inputs evolve, model behavior changes (remember those unexpected OpenAI updates?), and user expectations shift. Meanwhile, our golden datasets often remain frozen in time. The result? "Dataset Drift" - where your test cases no longer represent real user queries.
The solution? Build a data flywheel 🔄
1️⃣ Continuously monitor and evaluate logs from production (using feedback, auto-evals, etc.)
2️⃣ Use metrics and scores to filter your logs and find underperforming queries
3️⃣ Add these edge-cases to your test bank and manually correct LLM outputs
😎 Watch your test suite evolve with users
We’re seeing teams significantly improve LLM reliability by adopting this approach. Remember: Both your datasets and eval criteria need to continuously evolve with real-world usage!
Slides from my OSS4AI talk on this topic: https://lnkd.in/eHjhkkV9
evaluators:
tweet_char_count:
checker: numrange
target: (0, 20]
asserts: on
llm_tweet_choice_judge:
wraps:
allm_pairwise_judge:
repeat: 3
aggregate: flatten
criteria: |
- Make sure sentences are concise and don't use flowery language.
- It should say something interesting to an AI developer.
- It should make a profound and true claim about AI.
- It shouldn't sound too salesy or promotional.
- It should be factually accurate.
- It should be grammatically correct.
- Don't use too many adjectives or adverbs.
- Don't make general claims.
- Don't start with a question.
aggregate: elo_ratings