A more performant streaming json parser #511
CNSeniorious000
started this conversation in
Show and tell
Replies: 1 comment 2 replies
-
we're actually waiting on pydantic's very own jiter to support a faster partial |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description
I personally have been an LLM developer for a few month, structural output tools like
instructor
helped me a lot. These days I found the core implementation for streaming json parsing ininstructor
in instructor/dsl/partialjson.py, and think there is still some space to improve it:\
is not handled correctly:type may change during iteration:
In the example above,
a
is not a field in the pydantic model, but still it will be iterated from the iterator (let's thinka
is placeholder for a token).And then, the type of
ab
changed fromNone
tolist
.the current implementation doesn't support non-standard json types like
NaN
andInfinity
it is hard-linked with json.loads, but it maybe better to make use of
pydantic
's faster json parser, or use a 3-party json parser like ujson/orjsonSolution
I made a python package (as well as an almost the same js package) to parse partial json. It is designed to be flexible and performant. It also deal with
NaN
well (demo).Flexible means, its implementation based on completing the suffix JSON tokens. So basically you just get a complete JSON and then send it to any json parse function you like.
Performance:
(both use json.dumps under the hood)
partial-json-parser
is well-tested by thehypothesis
test framework, which generates strange JSONs, then we make sure that from the first char to the total string is parsable. This package runs on py3.6 - py3.12, and runs even faster on pypy.It is my personal project since half a year ago. I hope my work can benefit the LLM ecosystem.
Suggestion
If you agree with the Can I draft a PR, which will power
instructor.Partial
bypartial-json-parser
, withpydantic
's json parser fix the\\
and type issues mentioned above?Beta Was this translation helpful? Give feedback.
All reactions