Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON schema support for dynamic types #771

Open
sxlijin opened this issue Jul 10, 2024 · 7 comments
Open

JSON schema support for dynamic types #771

sxlijin opened this issue Jul 10, 2024 · 7 comments

Comments

@sxlijin
Copy link
Collaborator

sxlijin commented Jul 10, 2024

We have a working implementation of this in #655 that allows users to inject JSON schemas into TypeBuilder using the following syntaxes. We're current holding off on merging this, though, because JSON schema is a very complex format and we don't have any users asking for this. If you're interested in trying this out, please let us know and we'll merge this in and make sure this will work for your use case!

Python

class Person(pydantic.BaseModel):
    last_name: list[str]
    height: Optional[float] = pydantic.Field(description="Height in meters")

tb = TypeBuilder()
tb.unstable_features.add_json_schema(Person.model_json_schema())

res = await b.ExtractPeople(
    "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop. I like giraffes.",
    {"tb": tb},
)

TypeScript

const personSchema = z.object({
  animalLiked: z.object({
    animal: z.string().describe('The animal mentioned, in singular form.'),
  }),
  hobbies: z.enum(['chess', 'sports', 'music', 'reading']).array(),
  height: z.union([z.string(), z.number().int()]).describe('Height in meters'),
})

let tb = new TypeBuilder()
tb.unstableFeatures.addJsonSchema(zodToJsonSchema(personSchema, 'Person'))

const res = await b.ExtractPeople(
  "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop. I like giraffes.",
  { tb },
)
@BoundaryML BoundaryML deleted a comment from greptile-apps bot Jul 16, 2024
@arunbahl
Copy link

We'd love to use this feature.

We use Pydantic extensively, including at our ORM layer. This would allow us to continue defining types ergonomically in Python, and have them available in BAML functions.

@airhorns
Copy link

airhorns commented Sep 4, 2024

Same! For those of us with a bunch of zod schemas already built out, its a lot easier to adopt BAML going forward if we don't have to big bang migrate everything over to BAML in one go, and can instead do it a little bit at a time and intermingle our existing zod schemas. This'd be great!

@sxlijin
Copy link
Collaborator Author

sxlijin commented Sep 5, 2024

Here's a question about an alternative approach (sorry for the delayed response, @arunbahl !): would you be interested in something that could take your pydantic/zod schemas and generate BAML code from them?

Part of the reason we haven't shipped this yet is because:

  • part of the value proposition of using BAML, we believe, is the developer experience

    • you get live previews of their prompts as you edit them;
    • you get type-checking in your prompt templates as you expand them; and
    • you can define tests for your prompts right next to them, without having to write a bunch of pytest boilerplate

    all because your prompts are written in BAML

  • TypeBuilder is meant for types that must be defined on-the-fly, whereas most Pydantic and Zod schemas that we've seen are just defined statically

@anerli
Copy link

anerli commented Oct 23, 2024

I would also love to use this feature / it is kind of necessary for my use case to fully embrace BAML. And the alternative approach @sxlijin would not work for my use cases. For context, I am working on an agentic framework and using BAML for my prompting backend. Basically, there are two key reasons I need this feature:
(1) I am developing a Python library for other developers working with LLM agents, and part of this involves the developers providing their own schemas which eventually get passed to BAML dynamic return outputs on my backend. Without this feature, this isn't really possible without inventing my own schema system to convert properly to the necessary dynamic BAML types with the TypeBuilder. With this feature, I could enable developers to provide schemas in their preferred form - JSON schema, Pydantic, or with the BAML TypeBuilder if they wanted.
(2) I need to be able to serialize my dynamic output schemas. This is much easier if they are represented by Pydantic objects or JSON schemas, and doesn't currently seem very possible with TypeBuilder.

The alternative approach would not work for me because:
(1) It would complicate how I enable developers to provide JSON schemas - having to convert to BAML first as opposed to just passing it in when I make the BAML call.
(2) Some of my schemas may be generated at runtime, e.g. as a derived result from other LLM calls - meaning I would not be able to create a corresponding Pydantic schema beforehand.

@anerli
Copy link

anerli commented Oct 23, 2024

On a related note, having serde options for TypeBuilder and FieldType objects would be a partial solve for my use case. I think currently having the JSON support for dynamic types solves my use case better because of the first point I mentioned. However, being able to serialize and de-serialize TypeBuilder objects and field types would also add a lot of flexibility for me - not sure if other people would make use of this. In general my use case necessitates the use of a lot of dynamic types - so having maximum flexibility with how I work with them adds a lot of value to me.

@sxlijin
Copy link
Collaborator Author

sxlijin commented Dec 12, 2024

@aaronvg
Copy link
Contributor

aaronvg commented Dec 17, 2024

we have a community-contributed solution to this you can check out here: https://github.com/BoundaryML/baml-examples/tree/main/json-schema-to-baml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants