generated from mintlify/starter
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: refactored for clarity and simplicity
- Loading branch information
Showing
1 changed file
with
33 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ mode: wide | |
- `aws` >= 2.15 ([aws installation guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)) | ||
- `kubectl` >= 1.28 ([kubectl installation guide](https://kubernetes.io/docs/tasks/tools/#kubectl)) | ||
- `helm` >= 3.14 ([helm installation guide](https://helm.sh/docs/intro/install/#helm)) | ||
- A Trieve Vector Inference License | ||
|
||
<Accordion title="IAM Policy Minimum Requirements"> | ||
You need to have an IAM policy that allows to use the `eksctl` CLI. | ||
|
@@ -22,8 +23,6 @@ mode: wide | |
You are able to use the root account. However, AWS does not recommend doing this. | ||
</Accordion> | ||
|
||
You'll also need a license to run TVI. | ||
|
||
### Getting your license | ||
|
||
Contact us: | ||
|
@@ -52,17 +51,13 @@ Check quota [here](https://us-east-2.console.aws.amazon.com/servicequotas/home/s | |
|
||
### Setting up environment variables | ||
|
||
Create EKS cluster and install needed plugins | ||
|
||
Your AWS Account ID: | ||
```sh | ||
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)" | ||
``` | ||
|
||
Your AWS Region: | ||
|
||
<Note> TVI supports all regions that have the `GPU_INSTANCE` that are chosen </Note> | ||
|
||
```sh | ||
export AWS_REGION=us-east-2 | ||
``` | ||
|
@@ -99,8 +94,15 @@ export GPU_COUNT=1 | |
export AWS_PAGER="" | ||
``` | ||
|
||
<Note> TVI supports all regions that have the `GPU_INSTANCE` that are chosen </Note> | ||
|
||
|
||
### Create your cluster | ||
|
||
Create EKS cluster and install needed plugins | ||
|
||
The `bootstrap-eks.sh` script will create the EKS cluster, install the AWS Load Balancer Controller, and install the NVIDIA Device Plugin. This will also manage any IAM permissions that are needed for the plugins to work. | ||
|
||
Download the `bootstrap-eks.sh` script | ||
```sh | ||
wget cdn.trieve.ai/bootstrap-eks.sh | ||
|
@@ -165,8 +167,11 @@ models: | |
### Install the helm chart | ||
|
||
<Info> | ||
This helm chart will only work if you subscribe to the AWS Marketplace Listing | ||
This helm chart will only work if you subscribe to the AWS Marketplace Listing. | ||
</Info> | ||
<Info> | ||
Contact us at [email protected] if you do not have access to the AWS Marketplace or cannot use AWS marketplace. | ||
</Info> | ||
|
||
<Steps> | ||
<Step title="Login to AWS ecr repository"> | ||
|
@@ -205,6 +210,27 @@ vector-inference-embedding-spladequery-ingress alb * k8s-default-ve | |
|
||
The `Address` field is the endpoint that you can make [dense embeddings](/vector-inference/embed), [sparse embeddings](/vector-inference/embed_sparse), or [reranker calls](/vector-inference/reranker) based on the models you chose. | ||
|
||
## To ensure everything is working, make a request to the model endpoint provided. | ||
|
||
```sh | ||
# Replace the endpoint with the one you got from the previous step | ||
export ENDPOINT=k8s-default-vectorin-18b7ade77a-2040086997.us-east-2.elb.amazonaws.com | ||
|
||
curl -X POST \ | ||
-H "Content-Type: application/json"\ | ||
-d '{"inputs": "test input"}' \ | ||
--url "http://$ENDPOINT/embed" \ | ||
-w "\n\nInfernce Took%{time_total} seconds!\n" | ||
``` | ||
|
||
The output should look like something like this | ||
|
||
```sh | ||
# The vector | ||
[[ 0.038483415, -0.00076982786, -0.020039458 ... ], [ 0.04496114, -0.039057795, -0.022400795, ... ]] | ||
Inference only Took 0.067066 seconds! | ||
``` | ||
|
||
## Using Trieve Vector Inference | ||
|
||
Each `ingress` point will be using their own Application Load Balancer within AWS. The `Address` provided is the model's endpoint that you can make [dense embeddings](/vector-inference/embed), [sparse embeddings](/vector-inference/embed_sparse), or [reranker calls](/vector-inference/reranker) based on the models you chose. | ||
|