Instructor 1.5.0 #1029

ivanleomk · 2024-09-30T01:24:40Z

ivanleomk
Sep 30, 2024
Maintainer

We’re releasing instructor 1.5.0 today!

In this announcement article, we’ll highlight some of these new changes as well as bring to light some of the new techniques and results we’ve found from our own experiments and benchmarks. We'll cover

Our new support for Jinja with the context keyword
How to use Gemini with the google-generativeai package for multimodal content
Contextual Caching with Anthropic
Structured Outputs for Gemini

Jinja Support

We’ve written about all the features that you can use with Jinja and Instructor here if you’d like to explore how to use it in greater detail

We’ve introduced new jinja support in instructor 1.5.0 with the new context keyword.

This replaces the original validation_context keyword that we were using, allowing you to use the same set of system variables for both prompt formatting and validation.

This in turn allows you for

Better Prompt Versioning : Version your prompts separately from code
Seamless Validation Integration : Use the same variables that you’re using for prompt formatting with your validation for consistency
Secure Secret Handling : Prevent the leakage of sensitive information by only passing in sensitive secrets at runtime and using Pydantic’sSecretStr to prevent these values from being logged
Improved Reusability : Jinja allows you to reuse templates and have complex logic implemented within the prompt instead of building boilerplate around it

To use Jinja, all you need to do is to pass in a prompt formatted according to Jinja’s syntax which you can read about here and the variables inside the context variable and your prompt will be automatically rendered with the necessary variables.

This feature is supported for the Cohere, Anthropic, OpenAI and Gemini clients ( both VertexAI and google-generativeai ) at the moment and should work once you’ve installed the relevant dependencies for each client.

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())

class User(BaseModel):
    name: str
    age: int

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": """Extract the information from the
        following text: `{{ data }}`""" 
        },
    ],
    response_model=User,
    context = { 
        "data": "John Doe is thirty years old"
    }
)

print(resp)
#> User(name='John Doe', age=30)

We can even do more complex prompts that have if-else and iterate over lists of objects passed into them. This simplifies the code we need to use significantly.

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())

class Citation(BaseModel):
    source_ids: list[int]
    text: str

class Response(BaseModel):
    answer: list[Citation]

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """
                You are a {{ role }} tasks with the following question

                <question>
                {{ question }}
                </question>

                Use the following context to answer the question, make sure to return [id] for every citation:

                <context>
                {% for chunk in context %}
                  <context_chunk>
                    <id>{{ chunk.id }}</id>
                    <text>{{ chunk.text }}</text>
                  </context_chunk>
                {% endfor %}
                </context>

                {% if rules %}
                Make sure to follow these rules:

                {% for rule in rules %}
                  * {{ rule }}
                {% endfor %}
                {% endif %}
            """
        },
    ],
    response_model=Response,
    context = {
        "role": "professional educator",
        "question": "What is the capital of France?",
        "context": [
            {"id": 1, "text": "Paris is the capital of France."},
            {"id": 2, "text": "France is a country in Europe."}
        ],
        "rules": ["Use markdown."]
    }
)

print(resp)
# answer=[Citation(source_ids=[1], text='The capital of France is Paris.')]

Gemini Support

These examples are for the google-generativeai package. If you’re using the vertexai library, gm.Part is probably what you’d need to use instead.

See an example of how to work with multi-modal content with vertexai here

We’ve also expanded support for Gemini’s multi-modal capabilities and general in this release - allowing you to work with Audio, Video and Images all within the same prompt itself.

This allows for a few creative use-cases such as using multi-modal input as few shot examples - in the example below, we interleave audio and text together in a prompt for better transcriptions ( that are in-line with what you care about )

We’re using the Fleurs dataset here which contains audio files and a corresponding ground truth transcript. We then load in the dataset and then transcribe it with flash.

From initial experiments, this can decrease the Word Error Rate (WER) by almost 10%, matching that of Whisper Large.

import instructor
import google.generativeai as genai
from datasets import load_dataset
from pydub import AudioSegment
from pathlib import Path
from pydantic import BaseModel

client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

few_shot_examples = []
for item in load_dataset("google/fleurs", "en_us", split="train", streaming=True).take(
    4
):
    few_shot_examples.append(
        {
            "data": (
                AudioSegment.from_wav(Path("./data") / item["audio"]["path"])
                .export()
                .read()
            ),
            "mime_type": "audio/wav",
        }
    )
    few_shot_examples.append(f"Transcription: {item['raw_transcription']}")

test_item = [
    item
    for item in load_dataset(
        "google/fleurs", "en_us", split="test", streaming=True
    ).take(1)
][0]

class Transcription(BaseModel):
    chain_of_thought: str
    initial_transcription: str
    corrected_transcription: str

resp = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are an expert transcriber who is tasked with transcribing an audio recording. Your job is to accurately transcribe the audio. Make sure to insert in the correct punctuation and capitalization where appropriate and correct any errors in the transcription to generate a final corrected transcription.",
        },
        {
            "role": "user",
            "content": few_shot_examples,
        },
        {
            "role": "user",
            "content": {
                "mime_type": "audio/mp3",
                "data": AudioSegment.from_wav(
                    Path("./data") / test_item["audio"]["path"]
                )
                .export()
                .read(),
            },
        },
    ],
    response_model=Transcription,
)

print(resp.corrected_transcription)
#> However, due to the slow communication channels, styles in the West could lag behind by 25 to 30 years.
print(test_item["raw_transcription"])
#> However, due to the slow communication channels, styles in the west could lag behind by 25 to 30 year.

We can imagine scaling this up to more complex examples where we have videos interleaved with audio extraction examples before asking for specific timestamps.

Caching

Prompt Caching is now supported on both Anthropic and Gemini clients - this opens up a variety of techniques which previously were prohibitively expensive.

Contextual Retrieval with Anthropic

Anthropic recently outlined up a technique called Contextual Retrieval that takes advantage of their prompt caching . By using Haiku to generate new context using the following prompt that explains the chunk using the context of the overall document, they managed to significantly improve retrieval performance.

<document> 
{{WHOLE_DOCUMENT}} 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

We’ve written up a small example of how to implement this in Instructor here

Gemini Caching

With our new Gemini support, this also means that we can take advantage of Google’s caching capabilities which extend to Audio, Video, Image and text content.

This is tremendously useful if we’re extracting structured data from these forms of data ( read more about Caching with Gemini here ).

Here's a simple example where we use the cache to cache a 3 hour podcast that we uploaded using the files api and then verify cache usage with the create_with_completion method.

import google.generativeai as genai
from pydantic import BaseModel
import instructor
from rich import print
import datetime

podcast = genai.get_file("files/m2hm891jhbqg")

cache = genai.caching.CachedContent.create(
    model="models/gemini-1.5-flash-001",
    display_name=podcast",
    system_instruction=(
        "You are an intelligent AI system which excels at extracting information from audio files. Your job is to analyze the audio files and extract key information about the main chapters and topics covered in the podcast. Make sure to slowly iterate over the entire audio file slowly and carefully, and do not miss any important information."
    ),
    contents=[podcast],
    ttl=datetime.timedelta(minutes=5),
)

model = genai.GenerativeModel.from_cached_content(cached_content=cache)

client = instructor.from_gemini(
    client=model,
    mode=instructor.Mode.GEMINI_JSON,
)

class Chapter(BaseModel):
    title: str
    timestamp: str
    key_ideas: list[str]
    summary: str

class Transcript(BaseModel):
    chapters: list[Chapter]

resp, completion = client.create_with_completion(
    response_model=Transcript,
    messages=[
        {
            "role": "user",
            "content": "Extract the key ideas and summarize the following podcast. Make sure to extract key ideas from each chapter and summarize the entire podcast.",
        },
    ],
)

# Print Cache Data to verify usage
print(completion.usage_metadata)

# Get Validated Response
print(resp)

We can adapt the original contextual retrieval strategy from Anthropic as seen above with two key changes.

First, we need to pass in the client itself.

async def process_chunk(client: AsyncInstructor, chunk: str) -> Dict[str, str]:
    context = await situate_context(client, chunk)
    return {"chunk": chunk, "context": context}

async def process(client: AsyncInstructor, doc: str) -> List[Dict[str, str]]:
    chunks = chunking_function(doc)
    tasks = [process_chunk(client, chunk) for chunk in chunks[:1]]
    results = await asyncio.gather(*tasks)
    return results
    
async def situate_context(client: AsyncInstructor, chunk: str) -> str:
    resp = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Here is the chunk we want to situate within the whole document\n<chunk>{{chunk}}</chunk>\nPlease give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.\nAnswer only with the succinct context and nothing else.",
            }
        ],
        response_model=SituatedContext,
        context={"chunk": chunk},
    )
    return resp.context

Secondly, we need to initialise the cache manually and specify the model ahead of time. You’ll also need to use the Gemini_JSON mode since we can’t use tool calling with the cache.

Note that for Gemini, you’ll need a minimum of 32k tokens in order to create a cache. This however, will last for as long as you want and can contain up to 1M tokens.

cache = genai.caching.CachedContent.create(
    model="models/gemini-1.5-flash-001",
    display_name="cache",  # used to identify the cache
    system_instruction=(
        "You are about to be given a text document, and your job is to "
        "analyze the text and answer the user's query based on the text."
    ),
    contents=[document],
    ttl=datetime.timedelta(minutes=5),
)
model = genai.GenerativeModel.from_cached_content(cached_content=cache)
client = from_gemini(model, use_async=True)

Structured Outputs with Gemini

We're excited to announce that instructor now supports structured outputs using tool calling for both the Gemini SDK and the VertexAI SDK. It’s important to note that this is solely for text-only input at the moment since Function Calls are currently in beta for the gemini api, read more about it here

All you need to do is to invoke the from_gemini or from_vertexai methods and you’re good to go.

Let’s see an example below of how to use this new mode below.

import instructor
import google.generativeai as genai
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest", 
    )
)

resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

print(resp)
#> name='Jason' age=25

What’s Next?

Expect a lot of great new features to come as we continue to build out the instructor library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructor 1.5.0 #1029

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Instructor 1.5.0 #1029

ivanleomk Sep 30, 2024 Maintainer

Jinja Support

Gemini Support

Caching

Contextual Retrieval with Anthropic

Gemini Caching

Structured Outputs with Gemini

What’s Next?

Replies: 0 comments

ivanleomk
Sep 30, 2024
Maintainer