Skip to content

Commit

Permalink
docs: refactored for clarity and simplicity
Browse files Browse the repository at this point in the history
  • Loading branch information
cdxker committed Nov 21, 2024
1 parent 5a37a60 commit d530cef
Showing 1 changed file with 33 additions and 7 deletions.
40 changes: 33 additions & 7 deletions vector-inference/aws-installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ mode: wide
- `aws` >= 2.15 ([aws installation guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html))
- `kubectl` >= 1.28 ([kubectl installation guide](https://kubernetes.io/docs/tasks/tools/#kubectl))
- `helm` >= 3.14 ([helm installation guide](https://helm.sh/docs/intro/install/#helm))
- A Trieve Vector Inference License

<Accordion title="IAM Policy Minimum Requirements">
You need to have an IAM policy that allows to use the `eksctl` CLI.
Expand All @@ -22,8 +23,6 @@ mode: wide
You are able to use the root account. However, AWS does not recommend doing this.
</Accordion>

You'll also need a license to run TVI.

### Getting your license

Contact us:
Expand Down Expand Up @@ -52,17 +51,13 @@ Check quota [here](https://us-east-2.console.aws.amazon.com/servicequotas/home/s

### Setting up environment variables

Create EKS cluster and install needed plugins

Your AWS Account ID:
```sh
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"
```

Your AWS Region:

<Note> TVI supports all regions that have the `GPU_INSTANCE` that are chosen </Note>

```sh
export AWS_REGION=us-east-2
```
Expand Down Expand Up @@ -99,8 +94,15 @@ export GPU_COUNT=1
export AWS_PAGER=""
```

<Note> TVI supports all regions that have the `GPU_INSTANCE` that are chosen </Note>


### Create your cluster

Create EKS cluster and install needed plugins

The `bootstrap-eks.sh` script will create the EKS cluster, install the AWS Load Balancer Controller, and install the NVIDIA Device Plugin. This will also manage any IAM permissions that are needed for the plugins to work.

Download the `bootstrap-eks.sh` script
```sh
wget cdn.trieve.ai/bootstrap-eks.sh
Expand Down Expand Up @@ -165,8 +167,11 @@ models:
### Install the helm chart

<Info>
This helm chart will only work if you subscribe to the AWS Marketplace Listing
This helm chart will only work if you subscribe to the AWS Marketplace Listing.
</Info>
<Info>
Contact us at [email protected] if you do not have access to the AWS Marketplace or cannot use AWS marketplace.
</Info>

<Steps>
<Step title="Login to AWS ecr repository">
Expand Down Expand Up @@ -205,6 +210,27 @@ vector-inference-embedding-spladequery-ingress alb * k8s-default-ve

The `Address` field is the endpoint that you can make [dense embeddings](/vector-inference/embed), [sparse embeddings](/vector-inference/embed_sparse), or [reranker calls](/vector-inference/reranker) based on the models you chose.

## To ensure everything is working, make a request to the model endpoint provided.

```sh
# Replace the endpoint with the one you got from the previous step
export ENDPOINT=k8s-default-vectorin-18b7ade77a-2040086997.us-east-2.elb.amazonaws.com

curl -X POST \
-H "Content-Type: application/json"\
-d '{"inputs": "test input"}' \
--url "http://$ENDPOINT/embed" \
-w "\n\nInfernce Took%{time_total} seconds\!\n"
```

The output should look like something like this

```sh
# The vector
[[ 0.038483415, -0.00076982786, -0.020039458 ... ], [ 0.04496114, -0.039057795, -0.022400795, ... ]]
Inference only Took 0.067066 seconds!
```

## Using Trieve Vector Inference

Each `ingress` point will be using their own Application Load Balancer within AWS. The `Address` provided is the model's endpoint that you can make [dense embeddings](/vector-inference/embed), [sparse embeddings](/vector-inference/embed_sparse), or [reranker calls](/vector-inference/reranker) based on the models you chose.
Expand Down

0 comments on commit d530cef

Please sign in to comment.