From 0aac838b4bc865fa3465c3d0eb512225832cdad2 Mon Sep 17 00:00:00 2001
From: Nils Ceberg <nils.ceberg@neo4j.com>
Date: Tue, 20 Feb 2024 17:27:50 +0100
Subject: [PATCH 1/4] Fix bad GenAI batching example

---
 modules/ROOT/pages/genai-integrations.adoc | 23 ++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/modules/ROOT/pages/genai-integrations.adoc b/modules/ROOT/pages/genai-integrations.adoc
index 3064da3d7..f9cfb13f8 100644
--- a/modules/ROOT/pages/genai-integrations.adoc
+++ b/modules/ROOT/pages/genai-integrations.adoc
@@ -153,21 +153,28 @@ CREATE (:Page { index: index, text: resource, vector: vector })-[:OF]->(book)
 .Generate embeddings for many text properties
 ====
 
-If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can use `CALL ... IN TRANSACTIONS` to split the work up into batches and issue one API request per batch.
+If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can
+divide the nodes up into batches, and issue one API request per batch.
 
-Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to the `embedding` property on each one in batches of 1000, for example:
+Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings
+for each one and write them to their `embedding` property in batches of a thousand at a time, for example.
+Here we use `CALL ... IN TRANSACTIONS` to commit each batch separately to manage transaction memory consumption:
 
 .Query
 [source,cypher]
 ----
 MATCH (n:Tweet)
-WHERE n.text IS NOT NULL
+WHERE size(n.text) <> 0 AND n.embedding IS NULL
+WITH collect(n) AS nodes,
+     count(*) AS total,
+     1000 AS batchSize
+UNWIND range(0, total, batchSize) AS batchStart
 CALL {
-    WITH n
-    WITH collect(n) AS nodes, collect(n.text) AS resources
-    CALL genai.vector.encodeBatch(resources, "VertexAI", { token: $token, projectId: $project }) YIELD index, vector
-    CALL db.create.setNodeVectorProperty(nodes[index], "vector", vector)
-} IN TRANSACTIONS OF 1000 ROWS
+    WITH nodes, batchStart, batchSize
+    WITH nodes, batchStart, [node IN nodes[batchStart .. batchStart + batchSize] | node.text] AS batch
+    CALL genai.vector.encodeBatch(batch, "OpenAI", { token: $token }) YIELD index, vector
+    CALL db.create.setNodeVectorProperty(nodes[batchStart + index], "embedding", vector)
+} IN TRANSACTIONS OF 1 ROW
 ----
 ====
 

From d67524c9bc3665648a7630bff259fbc69feb9db3 Mon Sep 17 00:00:00 2001
From: Nils Ceberg <nils.ceberg@neo4j.com>
Date: Tue, 20 Feb 2024 18:16:01 +0100
Subject: [PATCH 2/4] Elaborate on CALL IN TRANSACTIONS

---
 modules/ROOT/pages/genai-integrations.adoc | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/modules/ROOT/pages/genai-integrations.adoc b/modules/ROOT/pages/genai-integrations.adoc
index f9cfb13f8..64e47bae9 100644
--- a/modules/ROOT/pages/genai-integrations.adoc
+++ b/modules/ROOT/pages/genai-integrations.adoc
@@ -176,6 +176,10 @@ CALL {
     CALL db.create.setNodeVectorProperty(nodes[batchStart + index], "embedding", vector)
 } IN TRANSACTIONS OF 1 ROW
 ----
+
+You can control how many batches are committed by each inner transaction by modifying the `OF 1 ROW` clause.
+For example, `OF 10 ROWS` will only commit once per 10 transactions. Because vector embeddings can be very large,
+this may require significantly more memory.
 ====
 
 [[ai-providers]]

From 9c0fb29956d5e403a5cfa4375ef83b392f9fd781 Mon Sep 17 00:00:00 2001
From: Nils Ceberg <nils.ceberg@neo4j.com>
Date: Wed, 21 Feb 2024 12:10:26 +0100
Subject: [PATCH 3/4] Review feedback

---
 modules/ROOT/pages/genai-integrations.adoc | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/modules/ROOT/pages/genai-integrations.adoc b/modules/ROOT/pages/genai-integrations.adoc
index 64e47bae9..6e2ae692d 100644
--- a/modules/ROOT/pages/genai-integrations.adoc
+++ b/modules/ROOT/pages/genai-integrations.adoc
@@ -39,8 +39,7 @@ You can use the `genai.vector.encode()` function to generate a vector embedding
 
 [IMPORTANT]
 ====
-This function sends one API request every time it is called, which may result in a lot of overhead in terms of both
-network traffic and latency.
+This function sends one API request every time it is called, which may result in a lot of overhead in terms of both network traffic and latency.
 If you want to generate many embeddings at once, use <<multiple-embeddings, `genai.vector.encodeBatch()`>>.
 ====
 
@@ -84,8 +83,7 @@ CALL db.index.vector.queryNodes("my_index", 10, queryVector) YIELD node, score R
 .Generating an embedding on the fly
 ====
 
-Assuming nodes with the `Tweet` label have an `id` property and a `text` property, you can generate and return
-the text embedding for the tweet with ID 1234:
+Assuming nodes with the `Tweet` label have an `id` property and a `text` property, you can generate and return the text embedding for the tweet with ID 1234:
 
 .Query
 [source,cypher]
@@ -101,8 +99,7 @@ RETURN genai.vector.encode(n.text, "VertexAI", { token: $token, projectId: $proj
 You can use the `genai.vector.encodeBatch()` procedure to generate many vector embeddings with a single API request.
 This procedure takes a list of resources as an input, and returns the same number of result rows, instead of a single one.
 
-Using this procedure is recommended in cases where a single large resource is split up into multiple chunks (like the pages of a book),
-or when generating embeddings for a large number of resources.
+Using this procedure is recommended in cases where a single large resource is split up into multiple chunks (like the pages of a book), or when generating embeddings for a large number of resources.
 
 [IMPORTANT]
 ====
@@ -153,12 +150,10 @@ CREATE (:Page { index: index, text: resource, vector: vector })-[:OF]->(book)
 .Generate embeddings for many text properties
 ====
 
-If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can
-divide the nodes up into batches, and issue one API request per batch.
+If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can divide the nodes up into batches, and issue one API request per batch.
 
-Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings
-for each one and write them to their `embedding` property in batches of a thousand at a time, for example.
-Here we use `CALL ... IN TRANSACTIONS` to commit each batch separately to manage transaction memory consumption:
+Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to their `embedding` property in batches of a thousand at a time, for example.
+You can use this in combination with `CALL ... IN TRANSACTIONS` to commit each batch separately to manage transaction memory consumption:
 
 .Query
 [source,cypher]
@@ -178,8 +173,7 @@ CALL {
 ----
 
 You can control how many batches are committed by each inner transaction by modifying the `OF 1 ROW` clause.
-For example, `OF 10 ROWS` will only commit once per 10 transactions. Because vector embeddings can be very large,
-this may require significantly more memory.
+For example, `OF 10 ROWS` will only commit once per 10 transactions. Because vector embeddings can be very large, this may require significantly more memory.
 ====
 
 [[ai-providers]]

From eb1f9419c002c7cecc84290fc691d98cac53f398 Mon Sep 17 00:00:00 2001
From: Nils Ceberg <nils.ceberg@neo4j.com>
Date: Wed, 21 Feb 2024 14:38:55 +0100
Subject: [PATCH 4/4] More review feedback

---
 modules/ROOT/pages/genai-integrations.adoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/modules/ROOT/pages/genai-integrations.adoc b/modules/ROOT/pages/genai-integrations.adoc
index 6e2ae692d..10ba91dbc 100644
--- a/modules/ROOT/pages/genai-integrations.adoc
+++ b/modules/ROOT/pages/genai-integrations.adoc
@@ -152,7 +152,7 @@ CREATE (:Page { index: index, text: resource, vector: vector })-[:OF]->(book)
 
 If you want to generate embeddings for the text content of all nodes with the label `Tweet`, you can divide the nodes up into batches, and issue one API request per batch.
 
-Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to their `embedding` property in batches of a thousand at a time, for example.
+Assuming nodes with the `Tweet` label have a `text` property, you can generate vector embeddings for each one and write them to their `embedding` property in batches of, for example, a thousand at a time.
 You can use this in combination with `CALL ... IN TRANSACTIONS` to commit each batch separately to manage transaction memory consumption:
 
 .Query
@@ -173,7 +173,7 @@ CALL {
 ----
 
 You can control how many batches are committed by each inner transaction by modifying the `OF 1 ROW` clause.
-For example, `OF 10 ROWS` will only commit once per 10 transactions. Because vector embeddings can be very large, this may require significantly more memory.
+For example, `OF 10 ROWS` will only commit once per 10 batches. Because vector embeddings can be very large, this may require significantly more memory.
 ====
 
 [[ai-providers]]