Skip to content

Commit

Permalink
Docs cleanup (#1776)
Browse files Browse the repository at this point in the history
  • Loading branch information
okhat authored Nov 8, 2024
1 parent 97032e1 commit 6fa7f43
Show file tree
Hide file tree
Showing 11 changed files with 35 additions and 138 deletions.
11 changes: 1 addition & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Ditto! **DSPy** gives you the right general-purpose modules (e.g., `ChainOfThoug
All you need is:

```bash
pip install dspy-ai
pip install dspy
```

To install the very latest from `main`:
Expand All @@ -79,13 +79,6 @@ pip install git+https://github.com/stanfordnlp/dspy.git

Or open our intro notebook in Google Colab: [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/intro.ipynb)

By default, DSPy installs the latest `openai` from pip. However, if you install old version before OpenAI changed their API `openai~=0.28.1`, the library will use that just fine. Both are supported.

For the optional (alphabetically sorted) [Chromadb](https://github.com/chroma-core/chroma), [LanceDB](https://github.com/lancedb/lancedb), [Groq](https://github.com/groq/groq-python), [Marqo](https://github.com/marqo-ai/marqo), [Milvus](https://github.com/milvus-io/milvus), [MongoDB](https://www.mongodb.com), [MyScaleDB](https://github.com/myscale/myscaledb), Pinecone, [Qdrant](https://github.com/qdrant/qdrant), [Snowflake](https://github.com/snowflakedb/snowpark-python), or [Weaviate](https://github.com/weaviate/weaviate) retrieval integration(s), include the extra(s) below:

```
pip install dspy-ai[chromadb] # or [lancedb] or [groq] or [marqo] or [milvus] or [mongodb] or [myscale] or [pinecone] or [qdrant] or [snowflake] or [weaviate]
```

## 2) Documentation

Expand Down Expand Up @@ -140,8 +133,6 @@ If you're new to DSPy, it's probably best to go in sequential order. You will pr

### C) Examples

The DSPy team believes complexity has to be justified. We take this seriously: we never release a complex tutorial (above) or example (below) _unless we can demonstrate empirically that this complexity has generally led to improved quality or cost._ This kind of rule is rarely enforced by other frameworks or docs, but you can count on it in DSPy examples.

There's a bunch of examples in the `examples/` directory and in the top-level directory. We welcome contributions!
You can find other examples tweeted by [@lateinteraction](https://twitter.com/lateinteraction) on Twitter/X.
Expand Down
50 changes: 0 additions & 50 deletions docs/docs/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,24 +28,6 @@ Other FAQs. We welcome PRs to add formal answers to each of these here. You will

You can specify multiple output fields. For the short-form signature, you can list multiple outputs as comma separated values, following the "->" indicator (e.g. "inputs -> output1, output2"). For the long-form signature, you can include multiple `dspy.OutputField`s.

- **How can I work with long responses?**

You can specify the generation of long responses as a `dspy.OutputField`. To ensure comprehensive checks of the content within the long-form generations, you can indicate the inclusion of citations per referenced context. Such constraints such as response length or citation inclusion can be stated through Signature descriptions, or concretely enforced through DSPy Assertions. Check out the [LongFormQA notebook](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/longformqa/longformqa_assertions.ipynb) to learn more about **Generating long-form length responses to answer questions**.

- **How can I ensure that DSPy doesn't strip new line characters from my inputs or outputs?**

DSPy uses [Signatures](/deep-dive/signature/understanding-signatures) to format prompts passed into LMs. In order to ensure that new line characters aren't stripped from longer inputs, you must specify `format=str` when creating a field.

```python
class UnstrippedSignature(dspy.Signature):
"""Enter some information for the model here."""

title = dspy.InputField()
object = dspy.InputField(format=str)
result = dspy.OutputField(format=str)
```

`object` can now be a multi-line string without issue.

- **How do I define my own metrics? Can metrics return a float?**

Expand Down Expand Up @@ -111,12 +93,6 @@ You can parallelize DSPy programs during both compilation and evaluation by spec

Modules can be frozen by setting their `._compiled` attribute to be True, indicating the module has gone through optimizer compilation and should not have its parameters adjusted. This is handled internally in optimizers such as `dspy.BootstrapFewShot` where the student program is ensured to be frozen before the teacher propagates the gathered few-shot demonstrations in the bootstrapping process.

- **How do I get JSON output?**

You can specify JSON-type descriptions in the `desc` field of the long-form signature `dspy.OutputField` (e.g. `output = dspy.OutputField(desc='key-value pairs')`).

If you notice outputs are still not conforming to JSON formatting, try Asserting this constraint! Check out [Assertions](/building-blocks/7-assertions) (or the next question!)

- **How do I use DSPy assertions?**

a) **How to Add Assertions to Your Program**:
Expand All @@ -139,29 +115,3 @@ If you notice outputs are still not conforming to JSON formatting, try Asserting
If you're dealing with "context too long" errors in DSPy, you're likely using DSPy optimizers to include demonstrations within your prompt, and this is exceeding your current context window. Try reducing these parameters (e.g. `max_bootstrapped_demos` and `max_labeled_demos`). Additionally, you can also reduce the number of retrieved passages/docs/embeddings to ensure your prompt is fitting within your model context length.

A more general fix is simply increasing the number of `max_tokens` specified to the LM request (e.g. `lm = dspy.OpenAI(model = ..., max_tokens = ...`).

- **How do I deal with timeouts or backoff errors?**

Firstly, please refer to your LM/RM provider to ensure stable status or sufficient rate limits for your use case!

Additionally, try reducing the number of threads you are testing on as the corresponding servers may get overloaded with requests and trigger a backoff + retry mechanism.

If all variables seem stable, you may be experiencing timeouts or backoff errors due to incorrect payload requests sent to the api providers. Please verify your arguments are compatible with the SDK you are interacting with.

You can configure backoff times for your LM/RM provider by setting `dspy.settings.backoff_time` while configuring your DSPy workflow.

```python
dspy.settings.configure(backoff_time = ...)
```

Additionally, if you'd like to set individual backoff times for specific providers, you can do so through the DSPy context manager:

```python
with dspy.context(backoff_time = ..):
dspy.OpenAI(...) # example

with dspy.context(backoff_time = ..):
dspy.AzureOpenAI(...) # example
```

At times, DSPy may have hard-coded arguments that are not relevant for your compatible, in which case, please free to open a PR alerting this or comment out these default settings for your usage.
4 changes: 0 additions & 4 deletions docs/docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ hide:

---

# DSPy

![DSPy Logo](static/img/dspy_logo.png)

**DSPy is a framework for algorithmically optimizing LM prompts and weights**, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system _without_ DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

To make this more systematic and much more powerful, **DSPy** does two things. First, it separates the flow of your program (`modules`) from the parameters (LM prompts and weights) of each step. Second, **DSPy** introduces new `optimizers`, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a `metric` you want to maximize.
Expand Down
52 changes: 1 addition & 51 deletions docs/docs/quick-start/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,58 +8,8 @@ To install DSPy run:


```text
pip install dspy-ai
pip install dspy
```

Or open our intro notebook in Google Colab: [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/intro.ipynb)

By default, DSPy depends on `openai==0.28`. However, if you install `openai>=1.0`, the library will use that just fine. Both are supported.

For the optional LanceDB, Pinecone, Qdrant, ChromaDB, Marqo, or Milvus retrieval integration(s), include the extra(s) below:

!!! info "Installation Command"

=== "No Extras"
```markdown
pip install dspy-ai
```

=== "LanceDB"
```markdown
pip install dspy-ai[lancedb]
```

=== "Pinecone"
```markdown
pip install "dspy-ai[pinecone]"
```

=== "Qdrant"
```markdown
pip install "dspy-ai[qdrant]"
```

=== "ChromaDB"
```markdown
pip install "dspy-ai[chromadb]"
```

=== "Marqo"
```markdown
pip install "dspy-ai[marqo]"
```

=== "MongoDB"
```markdown
pip install "dspy-ai[mongodb]"
```

=== "Weaviate"
```markdown
pip install "dspy-ai[weaviate]"
```

=== "Milvus"
```markdown
pip install "dspy-ai[milvus]"
```
26 changes: 6 additions & 20 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,9 @@ nav:
- Build your first pipeline: quick-start/getting-started-02.md
- Components:
- Overview: building-blocks/solving_your_task.md
- Signatures:
- Overview: building-blocks/2-signatures.md
- Understanding Signatures: deep-dive/signature/understanding-signatures.md
- Data Handling:
- Overview: building-blocks/4-data.md
- Examples in DSPy: deep-dive/data-handling/examples.md
- Utilizing Built-in Datasets: deep-dive/data-handling/built-in-datasets.md
- Creating Custom Dataset: deep-dive/data-handling/loading-custom-data.md
- Language Models: building-blocks/1-language_models.md
- Signatures: building-blocks/2-signatures.md
- Data Handling: building-blocks/4-data.md
- Modules:
- Overview: building-blocks/3-modules.md
- Predict: deep-dive/modules/predict.md
Expand All @@ -35,9 +30,9 @@ nav:
- Retrieve: deep-dive/modules/retrieve.md
- Modules Guide: deep-dive/modules/guide.md
- Metrics and Assertions:
- Overview: building-blocks/7-assertions.md
- Metrics: building-blocks/5-metrics.md
- Assertions: deep-dive/assertions.md
- Assertions: building-blocks/7-assertions.md
- Assertions II: deep-dive/assertions.md
- Optimizers:
- Overview: building-blocks/6-optimizers.md
- LabeledFewShot: deep-dive/optimizers/LabeledFewShot.md
Expand All @@ -47,16 +42,7 @@ nav:
- BFRS: deep-dive/optimizers/bfrs.md
- CoPro: deep-dive/optimizers/copro.md
- MIProV2: deep-dive/optimizers/miprov2.md
- Language Model Clients:
- Overview: building-blocks/1-language_models.md
- Local Language Model Clients:
- HFClientTGI: deep-dive/language_model_clients/lm_local_models/HFClientTGI.md
- HFClientVLLM: deep-dive/language_model_clients/lm_local_models/HFClientVLLM.md
- MLC: deep-dive/language_model_clients/lm_local_models/MLC.md
- Ollama: deep-dive/language_model_clients/lm_local_models/Ollama.md
- LlamaCpp: deep-dive/language_model_clients/lm_local_models/LlamaCpp.md
- TensorRTLLM: deep-dive/language_model_clients/lm_local_models/TensorRTLLM.md
- Retrieval Model Clients:
- Retrieval Model Integrations:
- Azure: deep-dive/retrieval_models_clients/Azure.md
- ChromadbRM: deep-dive/retrieval_models_clients/ChromadbRM.md
- ClarifaiRM: deep-dive/retrieval_models_clients/ClarifaiRM.md
Expand Down
File renamed without changes.
13 changes: 13 additions & 0 deletions examples/llamaindex/dspy_llamaindex_rag.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f8eb12c8",
"metadata": {},
"source": [
"# DEPRECATION WARNING\n",
"\n",
"This integration with LlamaIndex is no longer supported.\n",
"\n",
"\n",
"----"
]
},
{
"cell_type": "markdown",
"id": "849dbd89-ce04-4a18-84fb-c19f3db5504a",
Expand Down
12 changes: 12 additions & 0 deletions examples/tweets/compiling_langchain.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DEPRECATION WARNING\n",
"\n",
"This integration with LangChain is no longer supported.\n",
"\n",
"\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
#replace_package_name_marker
name="dspy"
#replace_package_version_marker
version="2.5.27"
version="2.5.30"
description = "DSPy"
readme = "README.md"
authors = [{ name = "Omar Khattab", email = "[email protected]" }]
Expand Down Expand Up @@ -85,7 +85,6 @@ license = "MIT"
readme = "README.md"
homepage = "https://github.com/stanfordnlp/dspy"
repository = "https://github.com/stanfordnlp/dspy"
# documentation = "https://dspy-ai.readthedocs.io"
keywords = ["dspy", "ai", "language models", "llm", "openai"]
# may be a bit much

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#replace_package_name_marker
name="dspy",
#replace_package_version_marker
version="2.5.27",
version="2.5.30",
description="DSPy",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit 6fa7f43

Please sign in to comment.