This is an example Node.js application that utilizes embeddings and the LLaMA model for text retrieval and response generation. It processes a text corpus, generates embeddings for "chunks", and uses these embeddings to performa a "similarity search" in response to queries. The system consists of a node.js server that handles API requests and a p5.js sketch for client interaction.
server.js
: Server file that handles API requests and integrates with the Replicate API.save-embeddings.js
: Process a text file and generate embeddings.test-embeddings.js
: Test the embeddings search functionality without all that client server stuff.embeddings.json
: Precomputed embeddings generated from the text corpus.public/
: p5.js sketch.env
: API token
- Using open-source models for faster and cheaper text embeddings
- How to use retrieval augmented generation
- Install Dependencies
npm install
- Set up the
.env
file with your Replicate API token:
REPLICATE_API_TOKEN=your_api_token_here
- Generate the
embeddings.json
file by runningsave-embeddings.js
. (You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.)
const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);
node save-embeddings.js
- Run the Server
node server.js
Open browser to: http://localhost:3000
(or whatever port is specified.)