docs: update docs

MoritzLaurer · Nov 18, 2024 · 70ef372 · 70ef372
1 parent fb04d20
commit 70ef372
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 14 deletions.
diff --git a/docs/agents_tools.md b/docs/agents_tools.md
@@ -1,5 +1,5 @@
+# Agents and Tools
 
-## Working with tools and agents
 Sharing tools and agents on the HF Hub in a standardized way is not implemented yet.
 This page contains some initial thoughts on this. 
 
@@ -10,7 +10,7 @@ Potential standard ways of storing tools:
 - JSON files: Tool use and function calling is often handled via JSON strings and different libraries then provide different abstractions on top of this. 
 - .py file: libraries like `LangChain` or `Transformers.Agents` enable the use of tools/functions via normal python functions with doc strings and a decorator. This would be less universally compatible/interoperable though. 
 
-`Transformers.Agents` currently has [Tool.push_to_hub](https://huggingface.co/docs/transformers/v4.45.2/en/main_classes/agent#transformers.Tool.push_to_hub) which pushes tools to the hub as a Space. This makes sense if users want a hosted tool with compute, but it is not interoperable with API client libraries. Some tools & prompts have been stored [here on the Hub](https://huggingface.co/huggingface-tools) in different formats.
+`Transformers.Agents` currently has [Tool.push_to_hub](https://huggingface.co/docs/transformers/v4.45.2/en/main_classes/agent#transformers.Tool.push_to_hub) which pushes tools to the hub as a Space. Some tools & prompts have been stored like this [here](https://huggingface.co/huggingface-tools) on the Hub. This makes sense if users want a hosted tool with compute, but it is not interoperable with API client libraries.
 
 
 ### How to handle agents?

diff --git a/docs/index.md b/docs/index.md
@@ -1,4 +1,5 @@
-# Hugging Face Hub Prompts 
+# Prompts on the Hugging Face Hub
+
 Prompts have become a key artifact for researchers and practitioners working with AI. 
 There is, however, no standardized way of sharing prompts.
 Prompts are shared on the HF Hub in [.txt files](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier/blob/main/utils/prompt.txt),

diff --git a/docs/standard_prompt_format.md b/docs/standard_prompt_format.md
@@ -1,6 +1,5 @@
 # Standardizing prompt templates
 
-## The standardized YAML/JSON prompt template format
 The library expects prompts to be stored in YAML or JSON files in any HF Hub repository. See the `Files` tab in these repos for [open-weight model prompts](https://huggingface.co/MoritzLaurer/open_models_special_prompts), [closed-model prompts](https://huggingface.co/MoritzLaurer/closed_system_prompts), or [dataset prompts](https://huggingface.co/datasets/MoritzLaurer/dataset_prompts).
 
 The YAML files must follow the following structure:
@@ -37,7 +36,12 @@ and [ChatPromptTemplate](https://python.langchain.com/api_reference/core/prompts
 ### Pro/Con prompts as datasets
 - Some prompt datasets like [awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) have received many likes on HF
 - The dataset viewer allows for easy and quick visualization
-- Main con: tabular format is not well suited for reusing prompts and is not standard among practitioners
+- Main cons: the tabular data format is not well suited for reusing prompts and is not standard among practitioners
+    - Prompts are independent units that can be used in different applications, while a dataset forces different prompts into one parquet file. 
+    - Having multiple prompts in the same dataset forces different prompts to have the same column structure
+    - Extracting a single prompt from a dataset with dataset/pandas-like operations is unnecessarily complicated
+    - Editing a prompt in JSON or YAML is much easier than editing a (parquet) dataset
+    - Data viewers for tabular data are bad for visualizing the structure of long prompts (with line breaks etc.)
 
 
 ### Compatibility with LangChain
@@ -67,12 +71,9 @@ prompt_template_langchain = prompt_template.to_langchain_template()
 
 
 ### Existing prompt template repos:
-- distilabel: https://github.com/argilla-io/distilabel/tree/main/src/distilabel/steps/tasks/templates; https://distilabel.argilla.io/latest/components-gallery/tasks/
-- langchain hub for prompts: https://smith.langchain.com/hub, old public oss repo: https://github.com/hwchase17/langchain-hub
-- langgraph templates for agents: https://blog.langchain.dev/launching-langgraph-templates/
-- deepset prompt hub: https://github.com/deepset-ai/prompthub
-- promptify (not maintained anymore)  https://github.com/promptslab/Promptify/tree/27a53fa8e8f2a4d90f887d06ece65a44466f873a/promptify/prompts
-
-### Other resources on working with prompts
-- langfuse prompt management: https://langfuse.com/docs/prompts/example-langchain
-
+- distilabel [templates](https://github.com/argilla-io/distilabel/tree/main/src/distilabel/steps/tasks/templates) and [tasks](https://distilabel.argilla.io/latest/components-gallery/tasks/) (using pure jinja2 with {{ ... }} for input variables)
+- [LangChain Hub](https://smith.langchain.com/hub) for prompts (main hub is proprietary. See the old public oss [repo](https://github.com/hwchase17/langchain-hub), using JSON or YAML, with {...} for input variables)
+- [LangGraph Templates](https://blog.langchain.dev/launching-langgraph-templates/) (underlying data structure unclear, does not seem to have a collaborative way of sharing templates)
+- [Deepset Prompt Hub](https://github.com/deepset-ai/prompthub) (seems not maintained anymore, used YAML with {...} for input variables)
+- [Promptify](https://github.com/promptslab/Promptify/tree/27a53fa8e8f2a4d90f887d06ece65a44466f873a/promptify/prompts) (not maintained anymore, used jinja1 and {{ ... }} for input variables)
+- [Langfuse](https://langfuse.com/docs/prompts/get-started) (no public prompt repo, using JSON internally with {{...}} for input variables)