Skip to content

Commit

Permalink
Fix bad GenAI batching example (neo4j#891)
Browse files Browse the repository at this point in the history
The current example doesn't do what it says it does
  • Loading branch information
nilsceberg authored and renetapopova committed Feb 23, 2024
1 parent 10e9899 commit 1752fdc
Showing 1 changed file with 19 additions and 14 deletions.
33 changes: 19 additions & 14 deletions modules/ROOT/pages/genai-integrations.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ You can use the `genai.vector.encode()` function to generate a vector embedding

[IMPORTANT]
====
This function sends one API request every time it is called, which may result in a lot of overhead in terms of both
network traffic and latency.
This function sends one API request every time it is called, which may result in a lot of overhead in terms of both network traffic and latency.
If you want to generate many embeddings at once, use <<multiple-embeddings, `genai.vector.encodeBatch()`>>.
====

Expand Down Expand Up @@ -84,8 +83,7 @@ CALL db.index.vector.queryNodes("my_index", 10, queryVector) YIELD node, score R
.Generating an embedding on the fly
====
Assuming nodes with the `Tweet` label have an `id` property and a `text` property, you can generate and return
the text embedding for the tweet with ID 1234:
Assuming nodes with the `Tweet` label have an `id` property and a `text` property, you can generate and return the text embedding for the tweet with ID 1234:
.Query
[source,cypher]
Expand All @@ -101,8 +99,7 @@ RETURN genai.vector.encode(n.text, "VertexAI", { token: $token, projectId: $proj
You can use the `genai.vector.encodeBatch()` procedure to generate many vector embeddings with a single API request.
This procedure takes a list of resources as an input, and returns the same number of result rows, instead of a single one.

Using this procedure is recommended in cases where a single large resource is split up into multiple chunks (like the pages of a book),
or when generating embeddings for a large number of resources.
Using this procedure is recommended in cases where a single large resource is split up into multiple chunks (like the pages of a book), or when generating embeddings for a large number of resources.

[IMPORTANT]
====
Expand Down Expand Up @@ -153,22 +150,30 @@ CREATE (:Page { index: index, text: resource, vector: vector })-[:OF]->(book)
.Generate embeddings for many text properties
====
If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can use `CALL ... IN TRANSACTIONS` to split the work up into batches and issue one API request per batch.
If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can divide the nodes up into batches, and issue one API request per batch.
Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to the `embedding` property on each one in batches of 1000, for example:
Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to their `embedding` property in batches of, for example, a thousand at a time.
You can use this in combination with `CALL ... IN TRANSACTIONS` to commit each batch separately to manage transaction memory consumption:
.Query
[source,cypher]
----
MATCH (n:Tweet)
WHERE n.text IS NOT NULL
WHERE size(n.text) <> 0 AND n.embedding IS NULL
WITH collect(n) AS nodes,
count(*) AS total,
1000 AS batchSize
UNWIND range(0, total, batchSize) AS batchStart
CALL {
WITH n
WITH collect(n) AS nodes, collect(n.text) AS resources
CALL genai.vector.encodeBatch(resources, "VertexAI", { token: $token, projectId: $project }) YIELD index, vector
CALL db.create.setNodeVectorProperty(nodes[index], "vector", vector)
} IN TRANSACTIONS OF 1000 ROWS
WITH nodes, batchStart, batchSize
WITH nodes, batchStart, [node IN nodes[batchStart .. batchStart + batchSize] | node.text] AS batch
CALL genai.vector.encodeBatch(batch, "OpenAI", { token: $token }) YIELD index, vector
CALL db.create.setNodeVectorProperty(nodes[batchStart + index], "embedding", vector)
} IN TRANSACTIONS OF 1 ROW
----
You can control how many batches are committed by each inner transaction by modifying the `OF 1 ROW` clause.
For example, `OF 10 ROWS` will only commit once per 10 batches. Because vector embeddings can be very large, this may require significantly more memory.
====

[[ai-providers]]
Expand Down

0 comments on commit 1752fdc

Please sign in to comment.