Instructor 1.5.0 #1029
ivanleomk
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We’re releasing
instructor
1.5.0 today!In this announcement article, we’ll highlight some of these new changes as well as bring to light some of the new techniques and results we’ve found from our own experiments and benchmarks. We'll cover
context
keywordgoogle-generativeai
package for multimodal contentJinja Support
We’ve introduced new jinja support in
instructor
1.5.0 with the newcontext
keyword.This replaces the original
validation_context
keyword that we were using, allowing you to use the same set of system variables for both prompt formatting and validation.This in turn allows you for
SecretStr
to prevent these values from being loggedTo use Jinja, all you need to do is to pass in a prompt formatted according to Jinja’s syntax which you can read about here and the variables inside the
context
variable and your prompt will be automatically rendered with the necessary variables.This feature is supported for the Cohere, Anthropic, OpenAI and Gemini clients ( both
VertexAI
andgoogle-generativeai
) at the moment and should work once you’ve installed the relevant dependencies for each client.We can even do more complex prompts that have
if-else
and iterate over lists of objects passed into them. This simplifies the code we need to use significantly.Gemini Support
See an example of how to work with multi-modal content with
vertexai
hereWe’ve also expanded support for Gemini’s multi-modal capabilities and general in this release - allowing you to work with Audio, Video and Images all within the same prompt itself.
This allows for a few creative use-cases such as using multi-modal input as few shot examples - in the example below, we interleave audio and text together in a prompt for better transcriptions ( that are in-line with what you care about )
We’re using the
Fleurs
dataset here which contains audio files and a corresponding ground truth transcript. We then load in the dataset and then transcribe it with flash.From initial experiments, this can decrease the Word Error Rate (WER) by almost 10%, matching that of Whisper Large.
We can imagine scaling this up to more complex examples where we have videos interleaved with audio extraction examples before asking for specific timestamps.
Caching
Prompt Caching is now supported on both Anthropic and Gemini clients - this opens up a variety of techniques which previously were prohibitively expensive.
Contextual Retrieval with Anthropic
Anthropic recently outlined up a technique called Contextual Retrieval that takes advantage of their prompt caching . By using Haiku to generate new context using the following prompt that explains the chunk using the context of the overall document, they managed to significantly improve retrieval performance.
We’ve written up a small example of how to implement this in Instructor here
Gemini Caching
With our new Gemini support, this also means that we can take advantage of Google’s caching capabilities which extend to Audio, Video, Image and text content.
This is tremendously useful if we’re extracting structured data from these forms of data ( read more about Caching with Gemini here ).
Here's a simple example where we use the cache to cache a 3 hour podcast that we uploaded using the
files
api and then verify cache usage with thecreate_with_completion
method.We can adapt the original contextual retrieval strategy from Anthropic as seen above with two key changes.
First, we need to pass in the client itself.
Secondly, we need to initialise the cache manually and specify the model ahead of time. You’ll also need to use the
Gemini_JSON
mode since we can’t use tool calling with the cache.Structured Outputs with Gemini
We're excited to announce that
instructor
now supports structured outputs using tool calling for both the Gemini SDK and the VertexAI SDK. It’s important to note that this is solely for text-only input at the moment since Function Calls are currently in beta for the gemini api, read more about it hereAll you need to do is to invoke the
from_gemini
orfrom_vertexai
methods and you’re good to go.Let’s see an example below of how to use this new mode below.
What’s Next?
Expect a lot of great new features to come as we continue to build out the
instructor
library.Beta Was this translation helpful? Give feedback.
All reactions