-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 100 Samples Per Regex / JSON Schema #35
Conversation
a199b7c
to
698809b
Compare
91c66eb
to
80b549e
Compare
src/benchmark_lfe.py
Outdated
for i in range(len(regex_example_tokens)): | ||
_ = token_enforcer.get_allowed_tokens(regex_example_tokens[: i + 1]) | ||
for regex_sample in regex_samples: | ||
regex_sample_tokens = self.tokenizer.encode(regex_sample) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this out of the timing method by pre-tokenizing the samples so we don't time this.
Given that the timings for Note that timings for this function on |
Ran the benchmarks locally with
|
pyproject.toml
Outdated
"outlines==0.0.46", | ||
"outlines-core==0.1.0", | ||
"lm-format-enforcer==0.10.7", | ||
"outlines==0.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to compare to the Numba version, can you use an earlier version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're no longer maintaining the Numba implementation of regex.py
, wouldn't it make sense to reference the last benchmark run prior to replacement rather than continuously tracking it?
Outlines benchmarks: https://github.com/dottxt-ai/outlines/actions/runs/11079055001/job/30787437777
I could also perform a single run of this suite with the Numba implementation without merging it if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now, we need the numbers for the outlines-core
release. We’ll tag main
once we’re happy with the setup, refer people to this tag for comparisons with Outlines and eventually remove it. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
outlines-core
doesn't have caching. I assume you'd like me to use Outlines caching with outlines-core
? (for now we can just copy https://github.com/dottxt-ai/outlines/blob/main/outlines/fsm/guide.py#L76-L99)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's use the latest version of outlines-core
.
08f10af
to
0d73c1c
Compare
By default ASV runs warmup steps prior to the measured run, resulting in the unexpected caching and generator exhaustion described above.
Seeing more sane benchmarks locally for a small subset. Will analyze the results of latest benchmark run first to ensure this is necessary. |
A few comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark method names, i.e. time_{package}
, seem a little redundant. The package is already given by the class name, and exactly what's being timed isn't apparent. Can we change one of those so that it clarifies exactly what is being measured?
Pushed a5adbe4 to fix benchmarks (sample run)
Changes
BenchmarksFor
Edit Pushed c86e55d which fixes a bug resulting in |
cc7400b
to
c86e55d
Compare
Just a heads-up: The You can see the benchmark workflow run for |
f208928
to
1d53dbc
Compare
Everything works as intended, so I will merge this PR. I will do a follow-up PR to separate the |
Fixes #19
Changes
src/samples/
data.py
: Removeexample
key and replace withsamples
keysrc/benchmark_*.py
ASV benchmark scripts to run 100 samples per benchmarkCaveat: We need to use
RegexGuide.from_regex
once dottxt-ai/outlines#1204 is merged and outlines version is bumped.Sample Generation Scripts
phone_number.json
url.json
gsm8k.json
complex_str.json
long_integer.json
recording_schema.json
andrpg_characters.json
TODO
outlines-core
is faster thanoutlines
on regex and conversely on JSON.outlines-core
's latest release