Skip to content

Commit

Permalink
Renamings, tests, docs
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewfrench committed May 14, 2024
1 parent a973dc7 commit 7b85d66
Show file tree
Hide file tree
Showing 33 changed files with 265 additions and 106 deletions.
6 changes: 5 additions & 1 deletion docs/griptape-framework/data/artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,8 @@ Each blob has a [name](../../reference/griptape/artifacts/base_artifact.md#gript

## ImageArtifact

An [ImageArtifact](../../reference/griptape/artifacts/image_artifact.md) is used for passing images back to the LLM. In addition to binary image data, an ImageArtifact includes image metadata like MIME type, dimensions, and prompt and model information for images returned by [image generation Drivers](../drivers/image-generation-drivers.md). It inherits from [BlobArtifact](#blobartifact).
An [ImageArtifact](../../reference/griptape/artifacts/image_artifact.md) is used for passing images back to the LLM. In addition to binary image data, an Image Artifact includes image metadata like MIME type, dimensions, and prompt and model information for images returned by [image generation Drivers](../drivers/image-generation-drivers.md). It inherits from [BlobArtifact](#blobartifact).

## AudioArtifact

An [AudioArtifact](../../reference/griptape/artifacts/audio_artifact.md) allows the Framework to interact with audio content. An Audio Artifact includes binary audio content as well as metadata like format, duration, and prompt and model information for audio returned generative models. It inherits from [BlobArtifact](#blobartifact).
33 changes: 33 additions & 0 deletions docs/griptape-framework/drivers/text-to-speech-drivers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Overview

[Text to Speech Drivers](../../reference/griptape/drivers/text_to_speech/index.md) are used by [Text To Speech Engines](../engines/audio/text-to-speech-engine.md) to build and execute API calls to audio generation models.

Provide a Driver when building an [Engine](../engines/audio-generation-engines.md), then pass it to a [Tool](../tools/index.md) for use by an [Agent](../structures/agents.md):

### Eleven Labs

The [Eleven Labs Text to Speech Driver](../../reference/griptape/drivers/text_to_speech/elevenlabs_text_to_speech_driver.md) provides support for text-to-speech models hosted by Eleven Labs. This Driver supports configurations specific to Eleven Labs, like voice selection and output format.

```python
import os

from griptape.drivers import ElevenLabsTextToSpeechDriver
from griptape.engines import TextToSpeechEngine
from griptape.tools.text_to_speech_client.tool import TextToSpeechClient
from griptape.structures import Agent


driver = ElevenLabsTextToSpeechDriver(
api_key=os.getenv("ELEVEN_LABS_API_KEY"),
model="eleven_multilingual_v2",
voice="Matilda",
)

tool = TextToSpeechClient(
engine=TextToSpeechEngine(
text_to_speech_driver=driver,
),
)

Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
```
29 changes: 29 additions & 0 deletions docs/griptape-framework/engines/audio-engines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Overview

[Audio Generation Engines](../../reference/griptape/engines/audio/index.md) facilitate audio generation. Audio Generation Engines provides a `run` method that accepts the necessary inputs for its particular mode and provides the request to the configured [Driver](../drivers/text-to-speech/index.md).

### Text to Speech Engine

This Engine facilitates synthesizing speech from text inputs.

```python
import os

from griptape.drivers import ElevenLabsTextToSpeechDriver
from griptape.engines import TextToSpeechEngine


driver = ElevenLabsTextToSpeechDriver(
api_key=os.getenv("ELEVEN_LABS_API_KEY"),
model="eleven_multilingual_v2",
voice="Rachel",
)

engine = TextToSpeechEngine(
text_to_speech_driver=driver,
)

engine.run(
prompts=["Hello, world!"],
)
```
27 changes: 27 additions & 0 deletions docs/griptape-tools/official-tools/text-to-speech-client.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# TextToSpeechClient

This tool enables LLMs to synthesize speech from text using [Text to Speech Engines](../../reference/griptape/engines/audio/text_to_speech_engine.md) and [Text to Speech Drivers](../../reference/griptape/drivers/text_to_speech/index.md).

```python
import os

from griptape.drivers import ElevenLabsTextToSpeechDriver
from griptape.engines import TextToSpeechEngine
from griptape.tools.text_to_speech_client.tool import TextToSpeechClient
from griptape.structures import Agent


driver = ElevenLabsTextToSpeechDriver(
api_key=os.getenv("ELEVEN_LABS_API_KEY"),
model="eleven_multilingual_v2",
voice="Matilda",
)

tool = TextToSpeechClient(
engine=TextToSpeechEngine(
text_to_speech_driver=driver,
),
)

Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
```
8 changes: 4 additions & 4 deletions griptape/config/structure_global_drivers_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
DummyPromptDriver,
DummyImageQueryDriver,
BaseImageQueryDriver,
BaseAudioGenerationDriver,
BaseTextToSpeechDriver,
)
from griptape.drivers.audio_generation.dummy_audio_generation_driver import DummyAudioGenerationDriver
from griptape.drivers.text_to_speech.dummy_text_to_speech_driver import DummyTextToSpeechDriver
from griptape.mixins.serializable_mixin import SerializableMixin


Expand All @@ -40,6 +40,6 @@ class StructureGlobalDriversConfig(SerializableMixin):
conversation_memory_driver: Optional[BaseConversationMemoryDriver] = field(
default=None, kw_only=True, metadata={"serializable": True}
)
audio_generation_driver: BaseAudioGenerationDriver = field(
default=Factory(lambda: DummyAudioGenerationDriver()), kw_only=True, metadata={"serializable": True}
audio_generation_driver: BaseTextToSpeechDriver = field(
default=Factory(lambda: DummyTextToSpeechDriver()), kw_only=True, metadata={"serializable": True}
)
12 changes: 6 additions & 6 deletions griptape/drivers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@
from .file_manager.local_file_manager_driver import LocalFileManagerDriver
from .file_manager.amazon_s3_file_manager_driver import AmazonS3FileManagerDriver

from .audio_generation.base_audio_generation_driver import BaseAudioGenerationDriver
from .audio_generation.dummy_audio_generation_driver import DummyAudioGenerationDriver
from .audio_generation.elevenlabs_audio_generation_driver import ElevenLabsAudioGenerationDriver
from .text_to_speech.base_text_to_speech_driver import BaseTextToSpeechDriver
from .text_to_speech.dummy_text_to_speech_driver import DummyTextToSpeechDriver
from .text_to_speech.elevenlabs_text_to_speech_driver import ElevenLabsTextToSpeechDriver

__all__ = [
"BasePromptDriver",
Expand Down Expand Up @@ -185,7 +185,7 @@
"BaseFileManagerDriver",
"LocalFileManagerDriver",
"AmazonS3FileManagerDriver",
"BaseAudioGenerationDriver",
"DummyAudioGenerationDriver",
"ElevenLabsAudioGenerationDriver",
"BaseTextToSpeechDriver",
"DummyTextToSpeechDriver",
"ElevenLabsTextToSpeechDriver",
]
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@


@define
class BaseAudioGenerationDriver(SerializableMixin, ExponentialBackoffMixin, ABC):
class BaseTextToSpeechDriver(SerializableMixin, ExponentialBackoffMixin, ABC):
model: str = field(kw_only=True, metadata={"serializable": True})
structure: Optional[Structure] = field(default=None, kw_only=True)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from typing import Optional
from attrs import define, field
from griptape.artifacts.audio_artifact import AudioArtifact
from griptape.drivers import BaseAudioGenerationDriver
from griptape.drivers import BaseTextToSpeechDriver
from griptape.exceptions import DummyException


@define
class DummyAudioGenerationDriver(BaseAudioGenerationDriver):
class DummyTextToSpeechDriver(BaseTextToSpeechDriver):
model: str = field(init=False)

def try_text_to_audio(self, prompts: list[str], negative_prompts: Optional[list[str]] = None) -> AudioArtifact:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@
from attr import define, field, Factory

from griptape.artifacts.audio_artifact import AudioArtifact
from griptape.drivers.audio_generation.base_audio_generation_driver import BaseAudioGenerationDriver
from elevenlabs.client import ElevenLabs
from griptape.drivers import BaseTextToSpeechDriver

if TYPE_CHECKING:
from elevenlabs.client import ElevenLabs


@define
class ElevenLabsAudioGenerationDriver(BaseAudioGenerationDriver):
class ElevenLabsTextToSpeechDriver(BaseTextToSpeechDriver):
api_key: str = field(kw_only=True, metadata={"serializable": True})
client: Any = field(
default=Factory(lambda self: ElevenLabs(api_key=self.api_key), takes_self=True),
Expand All @@ -29,4 +31,8 @@ def try_text_to_audio(self, prompts: list[str], negative_prompts: Optional[list[
for chunk in audio:
content += chunk

return AudioArtifact(value=content, format="mpeg")
# All ElevenLabs audio format strings have the following structure:
# {format}_{sample_rate}_{bitrate}
artifact_format = self.output_format.split("_")[0]

return AudioArtifact(value=content, format=artifact_format)
6 changes: 2 additions & 4 deletions griptape/engines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@
from .image.inpainting_image_generation_engine import InpaintingImageGenerationEngine
from .image.outpainting_image_generation_engine import OutpaintingImageGenerationEngine
from .image_query.image_query_engine import ImageQueryEngine
from .audio.base_audio_generation_engine import BaseAudioGenerationEngine
from .audio.text_to_audio_generation_engine import TextToAudioGenerationEngine
from .audio.text_to_speech_engine import TextToSpeechEngine

__all__ = [
"BaseQueryEngine",
Expand All @@ -28,6 +27,5 @@
"InpaintingImageGenerationEngine",
"OutpaintingImageGenerationEngine",
"ImageQueryEngine",
"BaseAudioGenerationEngine",
"TextToAudioGenerationEngine",
"TextToSpeechEngine",
]
19 changes: 0 additions & 19 deletions griptape/engines/audio/base_audio_generation_engine.py

This file was deleted.

12 changes: 0 additions & 12 deletions griptape/engines/audio/text_to_audio_generation_engine.py

This file was deleted.

14 changes: 14 additions & 0 deletions griptape/engines/audio/text_to_speech_engine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from __future__ import annotations

from attr import define, field

from griptape.artifacts.audio_artifact import AudioArtifact
from griptape.drivers import BaseTextToSpeechDriver


@define
class TextToSpeechEngine:
text_to_speech_driver: BaseTextToSpeechDriver = field(kw_only=True)

def run(self, prompts: list[str], *args, **kwargs) -> AudioArtifact:
return self.text_to_speech_driver.try_text_to_audio(prompts=prompts)
4 changes: 2 additions & 2 deletions griptape/mixins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
from .actions_subtask_origin_mixin import ActionsSubtaskOriginMixin
from .rule_mixin import RuleMixin
from .serializable_mixin import SerializableMixin
from .media_artifact_file_output_mixin import MediaArtifactFileOutputMixin
from .media_artifact_file_output_mixin import BlobArtifactFileOutputMixin

__all__ = [
"ActivityMixin",
"ExponentialBackoffMixin",
"ActionsSubtaskOriginMixin",
"RuleMixin",
"MediaArtifactFileOutputMixin",
"BlobArtifactFileOutputMixin",
"SerializableMixin",
]
6 changes: 3 additions & 3 deletions griptape/mixins/media_artifact_file_output_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
from typing import Optional

if TYPE_CHECKING:
from griptape.artifacts import MediaArtifact
from griptape.artifacts import BlobArtifact


@define(slots=False)
class MediaArtifactFileOutputMixin:
class BlobArtifactFileOutputMixin:
output_dir: Optional[str] = field(default=None, kw_only=True)
output_file: Optional[str] = field(default=None, kw_only=True)

Expand All @@ -31,7 +31,7 @@ def validate_output_file(self, _, output_file: str) -> None:
if self.output_dir:
raise ValueError("Can't have both output_dir and output_file specified.")

def _write_to_file(self, artifact: MediaArtifact) -> None:
def _write_to_file(self, artifact: BlobArtifact) -> None:
if self.output_file:
outfile = self.output_file
elif self.output_dir:
Expand Down
4 changes: 2 additions & 2 deletions griptape/tasks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from .variation_image_generation_task import VariationImageGenerationTask
from .image_query_task import ImageQueryTask
from .base_audio_generation_task import BaseAudioGenerationTask
from .audio_generation_task import AudioGenerationTask
from .text_to_speech_task import TextToSpeechTask

__all__ = [
"BaseTask",
Expand All @@ -39,5 +39,5 @@
"OutpaintingImageGenerationTask",
"ImageQueryTask",
"BaseAudioGenerationTask",
"AudioGenerationTask",
"TextToSpeechTask",
]
4 changes: 2 additions & 2 deletions griptape/tasks/base_audio_generation_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

from griptape.artifacts import MediaArtifact
from griptape.loaders import ImageLoader
from griptape.mixins import RuleMixin, MediaArtifactFileOutputMixin
from griptape.mixins import RuleMixin, BlobArtifactFileOutputMixin
from griptape.rules import Ruleset, Rule
from griptape.tasks import BaseTask


@define
class BaseAudioGenerationTask(MediaArtifactFileOutputMixin, RuleMixin, BaseTask, ABC):
class BaseAudioGenerationTask(BlobArtifactFileOutputMixin, RuleMixin, BaseTask, ABC):
...
4 changes: 2 additions & 2 deletions griptape/tasks/base_image_generation_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@

from griptape.artifacts import MediaArtifact
from griptape.loaders import ImageLoader
from griptape.mixins import RuleMixin, MediaArtifactFileOutputMixin
from griptape.mixins import RuleMixin, BlobArtifactFileOutputMixin
from griptape.rules import Ruleset, Rule
from griptape.tasks import BaseTask


@define
class BaseImageGenerationTask(MediaArtifactFileOutputMixin, RuleMixin, BaseTask, ABC):
class BaseImageGenerationTask(BlobArtifactFileOutputMixin, RuleMixin, BaseTask, ABC):
"""Provides a base class for image generation-related tasks.
Attributes:
Expand Down
Loading

0 comments on commit 7b85d66

Please sign in to comment.