Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge #1567

Merged
merged 49 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
75322b2
initial knowledge
joaomdmoura Nov 4, 2024
dc314c1
Merge branch 'main' into knowledge
bhancockio Nov 4, 2024
a8a2f80
WIP
bhancockio Nov 5, 2024
1a35114
Adding core knowledge sources
bhancockio Nov 6, 2024
6131dba
Improve types and better support for file paths
bhancockio Nov 6, 2024
617ee98
added additional sources
bhancockio Nov 6, 2024
4af263c
Merge branch 'main' into knowledge
bhancockio Nov 7, 2024
59165cb
fix linting
bhancockio Nov 7, 2024
86ede83
update yaml to include optional deps
bhancockio Nov 7, 2024
7b59c5b
adding in lorenze feedback
bhancockio Nov 7, 2024
98a708c
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 14, 2024
10f445e
ensure embeddings are persisted
lorenzejay Nov 15, 2024
cb03ee6
improvements all around Knowledge class
lorenzejay Nov 15, 2024
cdf5233
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 15, 2024
b907938
return this
lorenzejay Nov 15, 2024
352d053
properly reset memory
lorenzejay Nov 18, 2024
b2c06d5
properly reset memory+knowledge
lorenzejay Nov 18, 2024
cbfcde7
consolodation and improvements
lorenzejay Nov 18, 2024
4831dcb
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 18, 2024
d579c5a
linted
lorenzejay Nov 18, 2024
b104404
cleanup rm unused embedder
lorenzejay Nov 19, 2024
70910dd
fix test
lorenzejay Nov 19, 2024
c8bf242
fix duplicate
lorenzejay Nov 19, 2024
cbfdbe3
generating cassettes for knowledge test
lorenzejay Nov 19, 2024
e882725
updated default embedder
lorenzejay Nov 19, 2024
efa8a37
None embedder to use default on pipeline cloning
lorenzejay Nov 19, 2024
de742c8
improvements
lorenzejay Nov 19, 2024
914067d
fixed text_file_knowledge
lorenzejay Nov 19, 2024
0c5b6f2
mypysrc fixes
lorenzejay Nov 19, 2024
705ee16
type check fixes
lorenzejay Nov 19, 2024
58bf2d5
added extra cassette
lorenzejay Nov 19, 2024
ec2fe6f
just mocks
lorenzejay Nov 19, 2024
8373c9b
linted
lorenzejay Nov 19, 2024
e7d816f
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 19, 2024
787f2ea
mock knowledge query to not spin up db
lorenzejay Nov 20, 2024
b185b9e
linted
lorenzejay Nov 20, 2024
4663997
verbose run
lorenzejay Nov 20, 2024
76da972
put a flag
lorenzejay Nov 20, 2024
fe18da5
fix
lorenzejay Nov 20, 2024
23276cb
adding docs
lorenzejay Nov 20, 2024
3c4504b
better docs
lorenzejay Nov 20, 2024
44ab749
improvements from review
lorenzejay Nov 20, 2024
52189a4
more docs
lorenzejay Nov 20, 2024
8a54042
linted
lorenzejay Nov 20, 2024
8564f55
rm print
lorenzejay Nov 20, 2024
38c0d61
more fixes
lorenzejay Nov 20, 2024
9329119
clearer docs
lorenzejay Nov 20, 2024
6359b64
added docstrings and type hints for cli
lorenzejay Nov 20, 2024
c0ad457
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
run: uv python install 3.11.9

- name: Install the project
run: uv sync --dev
run: uv sync --dev --all-extras
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do you think we need the --all-extra option in this case? It seems like we'll have to install all the optional dependencies to be able to run our tests. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a bunch of optional dep that were brought up like the pdfplumber for our PdfKnowledgeSource.


- name: Run tests
run: uv run pytest tests
run: uv run pytest tests -vv
75 changes: 75 additions & 0 deletions docs/concepts/knowledge.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Knowledge
description: What is knowledge in CrewAI and how to use it.
icon: book
---

# Using Knowledge in CrewAI

## Introduction

The Knowledge class in CrewAI provides a powerful way to manage and query knowledge sources for your AI agents. This guide will show you how to implement knowledge management in your CrewAI projects.
Additionally, we have specific tools for generate knowledge sources for strings, text files, PDF's, and Spreadsheets. You can expand on any source type by extending the `KnowledgeSource` class.

## Basic Implementation

Here's a simple example of how to use the Knowledge class:

```python
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
content=content, metadata={"preference": "personal"}
)


llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent with the knowledge store
agent = Agent(
role="About User",
goal="You know everything about the user.",
backstory="""You are a master at understanding people and their preferences.""",
verbose=True,
allow_delegation=False,
llm=llm,
)
task = Task(
description="Answer the following questions about the user: {question}",
expected_output="An answer to the question.",
agent=agent,
)

crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
process=Process.sequential,
knowledge={"sources": [string_source], "metadata": {"preference": "personal"}}, # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
```


## Embedder Configuration

You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.

```python
...
string_source = StringKnowledgeSource(
content="Users name is John. He is 30 years old and lives in San Francisco.",
metadata={"preference": "personal"}
)
crew = Crew(
...
knowledge={
"sources": [string_source],
"metadata": {"preference": "personal"},
"embedder_config": {"provider": "openai", "config": {"model": "text-embedding-3-small"}},
},
)
```
10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ Repository = "https://github.com/crewAIInc/crewAI"
[project.optional-dependencies]
tools = ["crewai-tools>=0.14.0"]
agentops = ["agentops>=0.3.0"]
fastembed = ["fastembed>=0.4.1"]
pdfplumber = [
"pdfplumber>=0.11.4",
]
pandas = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:I'm wondering if we need to keep "pandas" as an optional dependency. I took a look at the code, and it seems we're only using it to read Excel files and save them as CSVs. Maybe we could find some lighter libraries to handle that? Just a thought!

If the lib is still required maybe we should go with "polars"

Polars: ~8.5MB
Pandas: ~12MB

Polars: ~70ms
NumPy: ~104ms
Pandas: ~520ms

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are optional deps, maybe this can be a fast follow ?

"pandas>=2.2.3",
]
openpyxl = [
"openpyxl>=3.1.5",
]
mem0 = ["mem0ai>=0.1.29"]

[tool.uv]
Expand Down
14 changes: 13 additions & 1 deletion src/crewai/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import warnings

from crewai.agent import Agent
from crewai.crew import Crew
from crewai.flow.flow import Flow
from crewai.knowledge.knowledge import Knowledge
from crewai.llm import LLM
from crewai.pipeline import Pipeline
from crewai.process import Process
Expand All @@ -15,4 +17,14 @@
module="pydantic.main",
)
__version__ = "0.80.0"
__all__ = ["Agent", "Crew", "Process", "Task", "Pipeline", "Router", "LLM", "Flow"]
__all__ = [
"Agent",
"Crew",
"Process",
"Task",
"Pipeline",
"Router",
"LLM",
"Flow",
"Knowledge",
]
15 changes: 14 additions & 1 deletion src/crewai/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
from crewai.cli.constants import ENV_VARS
from crewai.llm import LLM
from crewai.memory.contextual.contextual_memory import ContextualMemory
from crewai.tools.agent_tools.agent_tools import AgentTools
from crewai.tools import BaseTool
from crewai.tools.agent_tools.agent_tools import AgentTools
from crewai.utilities import Converter, Prompts
from crewai.utilities.constants import TRAINED_AGENTS_DATA_FILE, TRAINING_DATA_FILE
from crewai.utilities.token_counter_callback import TokenCalcHandler
Expand Down Expand Up @@ -52,6 +52,7 @@ class Agent(BaseAgent):
role: The role of the agent.
goal: The objective of the agent.
backstory: The backstory of the agent.
knowledge: The knowledge base of the agent.
config: Dict representation of agent configuration.
llm: The language model that will run the agent.
function_calling_llm: The language model that will handle the tool calling for this agent, it overrides the crew function_calling_llm.
Expand Down Expand Up @@ -272,6 +273,18 @@ def execute_task(
if memory.strip() != "":
task_prompt += self.i18n.slice("memory").format(memory=memory)

# Integrate the knowledge base
if self.crew and self.crew.knowledge:
knowledge_snippets = self.crew.knowledge.query([task.prompt()])
valid_snippets = [
result["context"]
for result in knowledge_snippets
if result and result.get("context")
]
if valid_snippets:
formatted_knowledge = "\n".join(valid_snippets)
task_prompt += f"\n\nAdditional Information:\n{formatted_knowledge}"

tools = tools or self.tools or []
self.create_agent_executor(tools=tools, task=task)

Expand Down
14 changes: 11 additions & 3 deletions src/crewai/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,24 +136,32 @@ def log_tasks_outputs() -> None:
@click.option("-l", "--long", is_flag=True, help="Reset LONG TERM memory")
@click.option("-s", "--short", is_flag=True, help="Reset SHORT TERM memory")
@click.option("-e", "--entities", is_flag=True, help="Reset ENTITIES memory")
@click.option("-kn", "--knowledge", is_flag=True, help="Reset KNOWLEDGE storage")
@click.option(
"-k",
"--kickoff-outputs",
is_flag=True,
help="Reset LATEST KICKOFF TASK OUTPUTS",
)
@click.option("-a", "--all", is_flag=True, help="Reset ALL memories")
def reset_memories(long, short, entities, kickoff_outputs, all):
def reset_memories(
long: bool,
short: bool,
entities: bool,
knowledge: bool,
kickoff_outputs: bool,
all: bool,
) -> None:
"""
Reset the crew memories (long, short, entity, latest_crew_kickoff_ouputs). This will delete all the data saved.
"""
try:
if not all and not (long or short or entities or kickoff_outputs):
if not all and not (long or short or entities or knowledge or kickoff_outputs):
click.echo(
"Please specify at least one memory type to reset using the appropriate flags."
)
return
reset_memories_command(long, short, entities, kickoff_outputs, all)
reset_memories_command(long, short, entities, knowledge, kickoff_outputs, all)
except Exception as e:
click.echo(f"An error occurred while resetting memories: {e}", err=True)

Expand Down
15 changes: 14 additions & 1 deletion src/crewai/cli/reset_memories_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,17 @@
from crewai.memory.long_term.long_term_memory import LongTermMemory
from crewai.memory.short_term.short_term_memory import ShortTermMemory
from crewai.utilities.task_output_storage_handler import TaskOutputStorageHandler
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage


def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
def reset_memories_command(
long,
short,
entity,
knowledge,
kickoff_outputs,
all,
) -> None:
"""
Reset the crew memories.

Expand All @@ -17,6 +25,7 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
entity (bool): Whether to reset the entity memory.
kickoff_outputs (bool): Whether to reset the latest kickoff task outputs.
all (bool): Whether to reset all memories.
knowledge (bool): Whether to reset the knowledge.
"""

try:
Expand All @@ -25,6 +34,7 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
EntityMemory().reset()
LongTermMemory().reset()
TaskOutputStorageHandler().reset()
KnowledgeStorage().reset()
click.echo("All memories have been reset.")
else:
if long:
Expand All @@ -40,6 +50,9 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
if kickoff_outputs:
TaskOutputStorageHandler().reset()
click.echo("Latest Kickoff outputs stored has been reset.")
if knowledge:
KnowledgeStorage().reset()
click.echo("Knowledge has been reset.")

except subprocess.CalledProcessError as e:
click.echo(f"An error occurred while resetting the memories: {e}", err=True)
Expand Down
14 changes: 14 additions & 0 deletions src/crewai/crew.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from crewai.memory.entity.entity_memory import EntityMemory
from crewai.memory.long_term.long_term_memory import LongTermMemory
from crewai.memory.short_term.short_term_memory import ShortTermMemory
from crewai.knowledge.knowledge import Knowledge
from crewai.memory.user.user_memory import UserMemory
from crewai.process import Process
from crewai.task import Task
Expand Down Expand Up @@ -201,6 +202,10 @@ class Crew(BaseModel):
default=[],
description="List of execution logs for tasks",
)
knowledge: Optional[Dict[str, Any]] = Field(
default=None, description="Knowledge for the crew. Add knowledge sources to the knowledge object."
)


@field_validator("id", mode="before")
@classmethod
Expand Down Expand Up @@ -275,6 +280,15 @@ def create_crew_memory(self) -> "Crew":
self._user_memory = None
return self

@model_validator(mode="after")
def create_crew_knowledge(self) -> "Crew":
if self.knowledge:
try:
self.knowledge = Knowledge(**self.knowledge) if isinstance(self.knowledge, dict) else self.knowledge
except (TypeError, ValueError) as e:
raise ValueError(f"Invalid knowledge configuration: {str(e)}")
return self

@model_validator(mode="after")
def check_manager_llm(self):
"""Validates that the language model is set when using hierarchical process."""
Expand Down
Empty file.
Empty file.
55 changes: 55 additions & 0 deletions src/crewai/knowledge/embedder/base_embedder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from abc import ABC, abstractmethod
from typing import List

import numpy as np


class BaseEmbedder(ABC):
"""
Abstract base class for text embedding models
"""

@abstractmethod
def embed_chunks(self, chunks: List[str]) -> np.ndarray:
"""
Generate embeddings for a list of text chunks

Args:
chunks: List of text chunks to embed

Returns:
Array of embeddings
"""
pass

@abstractmethod
def embed_texts(self, texts: List[str]) -> np.ndarray:
"""
Generate embeddings for a list of texts

Args:
texts: List of texts to embed

Returns:
Array of embeddings
"""
pass

@abstractmethod
def embed_text(self, text: str) -> np.ndarray:
"""
Generate embedding for a single text

Args:
text: Text to embed

Returns:
Embedding array
"""
pass

@property
@abstractmethod
def dimension(self) -> int:
"""Get the dimension of the embeddings"""
pass
Loading
Loading