Skip to content

Commit

Permalink
feat: int test, docs (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
Anush008 committed Jun 25, 2024
1 parent 2e8d5e3 commit fc0271d
Show file tree
Hide file tree
Showing 7 changed files with 628 additions and 66 deletions.
34 changes: 34 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Test

on:
pull_request:
types:
- opened
- edited
- synchronize
- reopened

permissions:
contents: write
checks: write

jobs:
build:
name: Build
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up JDK 17
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'

- name: Build And Test
uses: gradle/gradle-build-action@v2
with:
gradle-version: 8.5
arguments: build
225 changes: 225 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Qdrant Kafka Connector

Use Qdrant as a sink destination in [Kafka connect](https://docs.confluent.io/platform/current/connect/index.html). Supports streaming dense/sparse vectors into Qdrant collections.

## Installation

- Download the latest connector zip file from [Github Releases](https://github.com/qdrant/qdrant-kafka/releases).

- Refer to the first 3 steps of the [Kafka Quickstart](https://kafka.apache.org/quickstart#quickstart_download) to set up a local Kafka instance and create a topic named `topic_0`.

- Navigate to the Kafka installation directory.

- Unzip and copy the `qdrant-kafka-xxx` directories to the `libs` directory of your Kafka installation.

- Update the `connect-standalone.properties` file in the `config` directory of your Kafka installation.

```properties
key.converter.schemas.enable=false
value.converter.schemas.enable=false
plugin.path=libs/qdrant-kafka-xxx
```

- Create a `qdrant-kafka.properties` file in the `config` directory of your Kafka installation.

```properties
name=qdrant-kafka
connector.class=io.qdrant.kafka.QdrantSinkConnnector
qdrant.grpc.url=https://xyz-example.eu-central.aws.cloud.qdrant.io:6334
qdrant.api.key=<paste-your-api-key-here>
topics=topic_0
```

- Start the connector with the configured properties

```sh
bin/connect-standalone.sh config/connect-standalone.properties config/qdrant-kafka.properties
```

## Usage

> [!IMPORTANT]
> Before loading the data using this connector, a collection has to be [created](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) in advance with the appropriate vector dimensions and configurations.
You can now produce messages with the following command to the `topic_0` topic you created and they'll be streamed to the configured Qdrant instance.

```sh
bin/kafka-console-producer.sh --topic topic_0 --bootstrap-server localhost:9092
> { "collection_name": "{collection_name}", "id": 1, "vector": [ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 ], "payload": { "name": "kafka", "description": "Kafka is a distributed streaming platform", "url": "https://kafka.apache.org/" } }
```
This sink connector supports ingesting multiple named/unnamed, dense/sparse vectors.
_Click each to expand._
<details>
<summary><b>Unnamed/Default vector</b></summary>
```json
{
"collection_name": "{collection_name}",
"id": 1,
"vector": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8
],
"payload": {
"name": "kafka",
"description": "Kafka is a distributed streaming platform",
"url": "https://kafka.apache.org/"
}
}
```
</details>
<details>
<summary><b>Named vector</b></summary>
```json
{
"collection_name": "{collection_name}",
"id": 1,
"vector": {
"some-dense": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8
],
"some-other-dense": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8
]
},
"payload": {
"name": "kafka",
"description": "Kafka is a distributed streaming platform",
"url": "https://kafka.apache.org/"
}
}
```
</details>
<details>
<summary><b>Sparse vectors</b></summary>
```json
{
"collection_name": "{collection_name}",
"id": 1,
"shard_key_selector": [5235],
"vector": {
"some-sparse": {
"indices": [
0,
1,
2,
3,
4,
5,
6,
7,
8,
9
],
"values": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8,
0.9,
1.0
]
}
},
"payload": {
"name": "kafka",
"description": "Kafka is a distributed streaming platform",
"url": "https://kafka.apache.org/"
}
}
```
</details>
<details>
<summary><b>Combination of named dense and sparse vectors</b></summary>
```json
{
"collection_name": "{collection_name}",
"id": "a10435b5-2a58-427a-a3a0-a5d845b147b7",
"shard_key_selector": ["some-key"],
"vector": {
"some-other-dense": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8
],
"some-sparse": {
"indices": [
0,
1,
2,
3,
4,
5,
6,
7,
8,
9
],
"values": [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8,
0.9,
1.0
]
}
},
"payload": {
"name": "kafka",
"description": "Kafka is a distributed streaming platform",
"url": "https://kafka.apache.org/"
}
}
```
</details>
## LICENSE
Apache 2.0 © [2024](https://github.com/qdrant/qdrant-kafka/blob/main/LICENSE)
Loading

0 comments on commit fc0271d

Please sign in to comment.