Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Experiment and dataset improvements #6163

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Jan 24, 2025

resolves #6082
resolves #6153
resolves #6139

  • Propagates metadata from spans to dataset examples
  • Propagate tool definitions from spans to dataset examples as input
  • Removes redundant playground experiment metadata

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. size:S This PR changes 10-29 lines, ignoring generated files. size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. size:S This PR changes 10-29 lines, ignoring generated files. labels Jan 24, 2025
@anticorrelator anticorrelator changed the title feat: Spread span metadata onto dataset examples feat: Experiment and dataset miprovements Jan 24, 2025
@axiomofjoy axiomofjoy changed the title feat: Experiment and dataset miprovements feat: Experiment and dataset improvements Jan 24, 2025
if span_kind == LLM:
return _get_llm_span_input(
input_messages=input_messages,
input_value=input_value,
input_mime_type=input_mime_type,
prompt_template_variables=prompt_template_variables,
tool_definition=tool_definition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value refers to an attribute on tool spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure these are defined under tools on llm spans, I adjusted this to reflect the right structure

@anticorrelator anticorrelator force-pushed the dustin/propagate-metadata branch from 7aecb06 to 6d59100 Compare January 29, 2025 14:18
if span_kind == LLM:
return _get_llm_span_input(
input_messages=input_messages,
input_value=input_value,
input_mime_type=input_mime_type,
prompt_template_variables=prompt_template_variables,
tool_definitions=tool_definitions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just double check that this works with openai ft format? also it might be just simpler if the key was tools

if tool_definitions_data := [
_safely_json_decode(tool_definition) for tool_definition in tool_definitions
]:
input["tool_definitions"] = tool_definitions_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input["tool_definitions"] = tool_definitions_data
input["tools"] = tool_definitions_data

Kinda leaning this direction. Does this work with openai ft / evals format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the tool definitions are a key alongside the input:

https://platform.openai.com/docs/guides/fine-tuning#preparing-your-dataset-for-dpo

@@ -795,7 +795,7 @@ async def chat_completion_create(
elif isinstance(event, anthropic_streaming.InputJsonEvent):
raise NotImplementedError
else:
assert_never(event)
assert_never(event) # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a fix for this that @RogerHYang put out. It's the anthropic citations event. Can you cherry pick that and remove this ignore?

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: 📘 Todo
4 participants