Skip to content

Commit

Permalink
Merge branch 'release/v24.0' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanfoxtyler authored Aug 11, 2024
2 parents 77de360 + 2302bc7 commit 1759fe3
Show file tree
Hide file tree
Showing 16 changed files with 237 additions and 15 deletions.
1 change: 1 addition & 0 deletions .github/styles/Vocab/Dgraph/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ rebalancing
unary
loopback
snake_case
semver

Leia
Skywalker
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Dgraph Licensing

Copyright 2016-2021 Dgraph Labs, Inc.
Copyright 2016-2024 Dgraph Labs, Inc.

Source code in this repository is variously licensed under the Apache Public
License 2.0 (APL) and the Dgraph Community License (DCL). A copy of each license
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ Making our documentation easy to understand includes optimizing it for client-si
Use hugo shortcode for relref.

Example, to reference a term, use a relref to the glossary :
```
> [entity]({{< relref "dgraph-glossary.md#entity" >}})
```

### Staging doc updates locally

Expand Down
1 change: 1 addition & 0 deletions content/deploy/cli-command-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ The `--badger` superflag allows you to set many advanced [Badger options](https:
| `--query_edge_limit` | uint64 | `query-edge` | uint64 |`alpha`| Maximum number of edges that can be returned in a query |
| `--normalize_node_limit` | int | `normalize-node` | int |`alpha`| Maximum number of nodes that can be returned in a query that uses the normalize directive |
| `--mutations_nquad_limit` | int | `mutations-nquad` | int |`alpha`| Maximum number of nquads that can be inserted in a mutation request |
| `--max-pending-queries` | int | `max-pending-queries` | int |`alpha`| Maximum number of concurrently processing requests allowed before requests are rejected with 429 Too Many Requests |

### Raft superflag

Expand Down
2 changes: 1 addition & 1 deletion content/design-concepts/replication-concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ weight = 85
Each Highly-Available (HA) group will be served by at least 3 instances (or two if one is temporarily unavailable). In the case of an alpha instance
failure, other alpha instances in the same group still handle the load for data in that group. In case of a zero instance failure, the remaining two zeros in the zero group will continue to hand out timestamps and perform other zero functions.

In addition, Dgraph `Learner Nodes` are alpha instances that hold replicas of data, but this replication is to suupport read replicas, often in a different geography from the master cluster. This replication is implemented the same way as HA replication, but the learner nodes do not participate in quorum, and do not take over from failed nodes to provide high availability.
In addition, Dgraph `Learner Nodes` are alpha instances that hold replicas of data, but this replication is to support read replicas, often in a different geography from the master cluster. This replication is implemented the same way as HA replication, but the learner nodes do not participate in quorum, and do not take over from failed nodes to provide high availability.
14 changes: 14 additions & 0 deletions content/dql/dql-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .
description: string .
description_vector: float32vector @index(hnsw(metric:"cosine")) .
type Person {
name
Expand All @@ -28,6 +31,8 @@ type Film {
running_time
starring
director
description
description_vector
}
```

Expand Down Expand Up @@ -112,6 +117,15 @@ For all triples with a predicate of scalar types the object is a literal.
are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should
convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}}

### Vector Type

The `float32vector` type denotes a vector of floating point numbers, i.e an ordered array of float32. A node type can contain more than one vector predicate.

Vectors are normaly used to store embeddings obtained from other information through an ML model. When a `float32vector` is [indexed]({{<relref "dql/predicate-indexing.md">}}), the DQL [similar_to]({{<relref "query-language/functions#vector-similarity-search">}}) function can be used for similarity search.




### UID Type

The `uid` type denotes a relationship; internally each node is identified by it's UID which is a `uint64`.
Expand Down
35 changes: 35 additions & 0 deletions content/dql/predicate-indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,15 @@ weight = 4

Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index.

Indices are defined in the [Dgraph types schema]({{<relref "dql/dql-schema.md" >}}) using `@index` directive.

Here are some examples:
```
name: string @index(term) .
release_date: datetime @index(year) .
description_vector: float32vector @index(hnsw(metric:"cosine")) .
```

When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient.

All scalar types can be indexed.
Expand All @@ -17,6 +26,8 @@ Types `int`, `float`, `bool` and `geo` have only a default index each: with toke

Types `string` and `dateTime` have a number of indices.

Type `float32vector` supports `hnsw` index.

## String Indices
The indices available for strings are as follows.

Expand All @@ -34,6 +45,30 @@ transaction conflict rate. Use only the minimum number of and simplest indexes
that your application needs.
{{% /notice %}}

## Vector Indices

The indices available for `float32vector` are as follows.

| Dgraph function | Required index / tokenizer | Notes |
| :----------------------- | :------------ | :--- |
| `similar_to` | `hnsw` | HNSW index supports parameters `metric` and `exponent`. |


#

`hnsw` (**Hierarchical Navigable Small World**) index supports the following parameters
- metric : indicate the metric to use to compute vector similarity. One of `cosine`, `euclidean`, and `dotproduct`. Default is `euclidean`.

- exponent : An integer, represented as a string, roughly representing the number of vectors expected in the index in power of 10. The exponent value,is used to set "reasonable defaults" for HNSW internal tuning parameters. Default is "4" (10^4 vectors).


Here are some examples:
```
simple_vector: float32vector @index(hnsw) .
description_vector: float32vector @index(hnsw(metric:"cosine")) .
large_vector: float32vector @index(hnsw(metric:"euclidean",exponent:"6")) .
```

## DateTime Indices

The indices available for `dateTime` are as follows.
Expand Down
29 changes: 29 additions & 0 deletions content/graphql/mutations/mutations-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,35 @@ mutation {
}
```

## Vector Embedding mutations

For types with vector embeddings Dgraph automatically generates the add mutation. For this example of add mutation we use the following schema.

```graphql
type User {
userID: ID!
name: String!
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}

mutation {
addUser(input: [
{ name: "iCreate with a Mini iPad", name_v: [0.12, 0.53, 0.9, 0.11, 0.32] },
{ name: "Resistive Touchscreen", name_v: [0.72, 0.89, 0.54, 0.15, 0.26] },
{ name: "Fitness Band", name_v: [0.56, 0.91, 0.93, 0.71, 0.24] },
{ name: "Smart Ring", name_v: [0.38, 0.62, 0.99, 0.44, 0.25] }])
{
project {
id
name
name_v
}
}
}
```

Note: The embeddings are generated outside of Dgraph using any suitable machine learning model.

## Examples

You can refer to the following [link](https://github.com/dgraph-io/dgraph/tree/main/graphql/schema/testdata/schemagen) for more examples.
2 changes: 1 addition & 1 deletion content/graphql/queries/aggregate.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Aggregate Queries"
description = "Dgraph automatically generates aggregate queries for GraphQL schemas. These are compatible with the @auth directive."
weight = 3
weight = 4
[menu.main]
parent = "graphql-queries"
name = "Aggregate Queries"
Expand Down
64 changes: 64 additions & 0 deletions content/graphql/queries/vector-similarity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
+++
title = "Similarity Search"
description = "Dgraph automatically generates GraphQL queries for each vector index that you define in your schema. There are two types of queries generated for each index."
weight = 3
[menu.main]
parent = "graphql-queries"
identifier = "vector-queries"
+++

Dgraph automatically generates two GraphQL similarity queries for each type that have at least one [vector predicate](/graphql/schema/types/#vectors) with `@search` directive.

For example

```graphql
type User {
id: ID!
name: String!
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}
```

With the above schema, the auto-generated `querySimilar<Object>ByEmbedding` query allows us to run similarity search using the vector index specified in our schema.

```graphql
getSimilar<Object>ByEmbedding(
by: vector_predicate,
topK: n,
vector: searchVector): [User]
```

For example in order to find top 3 users with names similar to a given user name embedding the following query function can be used.

```graphql
querySimilarUserByEmbedding(by: name_v, topK: 3, vector: [0.1, 0.2, 0.3, 0.4, 0.5]) {
id
name
vector_distance
}
```
The results obtained for this query includes the 3 closest Users ordered by vector_distance. The vector_distance is the Euclidean distance between the name_v embedding vector and the input vector used in our query.

Note: you can omit vector_distance predicate in the query, the result will still be ordered by vector_distance.

The distance metric used is specified in the index creation.

Similarly, the auto-generated `querySimilar<Object>ById` query allows us to search for similar objects to an existing object, given it’s Id. using the function.

```graphql
getSimilar<Object>ById(
by: vector_predicate,
topK: n,
id: userID): [User]
```

For example the following query searches for top 3 users whose names are most similar to the name of the user with id "0xef7".

```graphql
querySimilarUserById(by: name_v, topK: 3, id: "0xef7") {
id
name
vector_distance
}
```

26 changes: 14 additions & 12 deletions content/graphql/quick-start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,19 +67,21 @@ You may want to use the introspection capability of the client to explore the sc
To populate the database,
1. Open the [API Explorer](https://cloud.dgraph.io/_/explorer) tab
2. Paste the following code into the text area:
```graphql
mutation {
addProduct(input: [
{ name: "GraphQL on Dgraph"},
{ name: "Dgraph: The GraphQL Database"}
]) {
product {
productID
name
}
```graphql
mutation {
addProduct(
input: [
{ name: "GraphQL on Dgraph" }
{ name: "Dgraph: The GraphQL Database" }
]
) {
product {
productID
name
}
addCustomer(input: [{ username: "Michael"}]) {
customer {
}
addCustomer(input: [{ username: "Michael" }]) {
customer {
username
}
}
Expand Down
6 changes: 6 additions & 0 deletions content/graphql/schema/directives/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ Reference: [Deprecation]({{< relref "deprecated.md" >}})

Reference: [@dgraph directive]({{< relref "directive-dgraph" >}})

### @embedding

`@embedding` directive designates one or more fields as vector embeddings.

Reference: [@embedding directive]({{< relref "embedding" >}})

### @generate

The `@generate` directive is used to specify which GraphQL APIs are generated for a type.
Expand Down
13 changes: 13 additions & 0 deletions content/graphql/schema/directives/embedding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
+++
title = "@embedding"
weight = 1
[menu.main]
parent = "directives"
+++


A Float array can be used as a vector using `@embedding` directive. It denotes a vector of floating point numbers, i.e an ordered array of float32.

The embeddings can be defined on one or more predicates of a type and they are generated using suitable machine learning models.

This directive is used in conjunction with `@search` directive to declare the HNSW index. For more information see: [@search](/graphql/schema/directives/search/#vector-embedding) directive for vector embeddings.
20 changes: 20 additions & 0 deletions content/graphql/schema/directives/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,3 +624,23 @@ query {
}
}
```

### Vector embedding

The `@search` directive is used in conjunction with `@embeding` directive to define the HNSW index on vector embeddings. These vector embeddings are obtained from external Machine Learning models.

```graphql
type User {
userID: ID!
name: String!
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}
```

In this schema, the field `name_v` is an embedding on which the HNSW algorithm is used to create a vector search index.

The metric used to compute the distance between vectors (in this example) is Euclidean distance. Other possible metrics are `cosine` and `dotproduct`.

The directive, `@embedding`, designates one or more fields as vector embeddings.

The `exponent` value is used to set reasonable defaults for HNSW internal tuning parameters. It is an integer representing an approximate number for the vectors expected in the index, in terms of power of 10. Default is “4” (10^4 vectors).
20 changes: 20 additions & 0 deletions content/graphql/schema/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,26 @@ type User {

Scalar lists in Dgraph act more like sets, so `tags: [String]` would always contain unique tags. Similarly, `recentScores: [Float]` could never contain duplicate scores.

### Vectors

A Float array can be used as a vector using `@embedding` directive. It denotes a vector of floating point numbers, i.e an ordered array of float32. A type can contain more than one vector predicate.

Vectors are normaly used to store embeddings obtained from an ML model.

When a Float vector is indexed, the GraphQL `querySimilar<type name>ByEmbedding` and `querySimilar<type name>ById` functions can be used for [similarity search]({{<relref "vector-similarity.md">}}).

A simple example of adding a vector embedding on `name` to `User` type is shown below.

```graphql
type User {
userID: ID!
name: String!
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}
```

In this schema, the field `name_v` is an embedding on which the [@search ](/graphql/schema/directives/search/#vector-embedding) directive for vector embeddings is used.

### The `ID` type

In Dgraph, every node has a unique 64-bit identifier that you can expose in GraphQL using the `ID` type. An `ID` is auto-generated, immutable and never reused. Each type can have at most one `ID` field.
Expand Down
15 changes: 15 additions & 0 deletions content/query-language/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,21 @@ Same query with a Levenshtein distance of 3.
}
{{< /runnable >}}

## Vector Similarity Search

Syntax Examples: `similar_to(predicate, 3, "[0.9, 0.8, 0, 0]")`

Alternatively the vector can be passed as a variable: `similar_to(predicate, 3, $vec)`

This function finds the nodes that have `predicate` close to the provided vector. The search is based on the distance metric specified in the index (`cosine`, `euclidean`, or `dotproduct`). The shorter distance indicates more similarity.
The second parameter, `3` specifies that top 3 matches be returned.

Schema Types: `float32vector`

Index Required: `hnsw`



## Full-Text Search

Syntax Examples: `alloftext(predicate, "space-separated text")` and `anyoftext(predicate, "space-separated text")`
Expand Down

0 comments on commit 1759fe3

Please sign in to comment.