Skip to content

Commit

Permalink
Replacing command args with command flags
Browse files Browse the repository at this point in the history
Signed-off-by: Daniele Martinoli <[email protected]>
  • Loading branch information
dmartinol committed Dec 20, 2024
1 parent 440ad39 commit 3d7757f
Showing 1 changed file with 19 additions and 9 deletions.
28 changes: 19 additions & 9 deletions docs/rag/ilab-rag-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,12 @@ The proposal is to add a `process` sub-command to the `data` command group.

For the Taxonomy path (no Model Training):
```
ilab data process /path/to/processed/folder
ilab data process --output /path/to/processed/folder
```

For the Plag-and-Play RAG path:
```
ilab data process --input /path/to/docs/folder /path/to/processed/folder
ilab data process --input /path/to/docs/folder --output /path/to/processed/folder
```

#### Command Purpose
Expand All @@ -112,18 +112,24 @@ The generated artifacts can later be used to generate and ingest the embeddings
### 2.3 Document Processing Pipeline Options
```bash
% ilab data process --help
Usage: ilab data process [OPTIONS] OUTPUT_DIR
Usage: ilab data process [OPTIONS]

The document processing pipeline

Options:
--input DIRECTORY The folder with user documents to process.
--help Show this message and exit.```
--input DIRECTORY The folder with user documents to process. In case
it's missing, the knowledge taxonomy files will be
processed instead.
--taxonomy-path PATH Directory where taxonomy is stored and accessed from.
--taxonomy-base TEXT Branch of taxonomy used to calculate diff against.
--output DIRECTORY Directory where processed docs are stored.
--help Show this message and exit.
```
| Option Description | Default Value | CLI Flag | Environment Variable |
|--------------------|---------------|----------|----------------------|
| Location folder of user documents. In case it's missing, the taxonomy is navigated to look for updated knowledge documents.| | `--input` | `ILAB_PROCESS_INPUT` |
| Location folder of processed documents. | | `--ouput` | `ILAB_PROCESS_OUTPUT` |
| Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` |
| Name of the embedding model. | **TBD** | `--embedding-model` | `ILAB_EMBEDDING_MODEL_NAME` |

Expand All @@ -137,7 +143,7 @@ ilab data ingest

For the Taxonomy or Plug-and-Play RAG paths:
```
ilab data ingest /path/to/processed/folder
ilab data ingest --input path/to/processed/folder
```

#### Working Assumption
Expand Down Expand Up @@ -170,20 +176,24 @@ context for RAG-based chat pipelines.
### 2.5 Embedding Ingestion Pipeline Options
```bash
% ilab data ingest --help
Usage: ilab data ingest [OPTIONS] INPUT_DIR
Usage: ilab data ingest [OPTIONS]
The embedding ingestion pipeline
Options:
--document-store-type TEXT The document store type, one of:
`milvuslite`, `milvus`.
`milvuslite`.
--document-store-uri TEXT The document store URI
--document-store-collection-name TEXT
The document store collection name
--model-dir TEXT Base directories where models are stored.
[default: (The default system model location
store, located in the data directory.)]
--embedding-model TEXT The embedding model name
--output-dir TEXT Directory where generated datasets are
stored.
--input DIRECTORY Directory where pre-processed documents are
located.
--help Show this message and exit.
```

Expand Down Expand Up @@ -352,7 +362,7 @@ ilab serve --rag-embeddings --image-name=docker.io/user/my_rag_artifacts:1.0 --p
ilab model chat --rag --retriever-type api --retriever-uri http://localhost:8123
```
[shareable-excalidraw]: https://excalidraw.com/#json=p126_RwjtILDahmnIFK9c,11xoIhRQCkUVRm0Wkg-Ysg
[shareable-excalidraw]: https://excalidraw.com/#json=ZiTMvxn67gep679hqJrpO,VBDADxrmGht7zJSWKQCjug
[ilab-knowledge]: https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions
[sdg-diff-strategy]: https://github.com/instructlab/sdg/blob/main/src/instructlab/sdg/utils/taxonomy.py
[chat_template]: https://github.com/instructlab/instructlab/blob/0a773f05f8f57285930df101575241c649f591ce/src/instructlab/configuration.py#L244
Expand Down

0 comments on commit 3d7757f

Please sign in to comment.