Skip to content

Commit

Permalink
Merge pull request #64 from getdozer/grpc-ingest
Browse files Browse the repository at this point in the history
feat: add default adapter for grpc ingestion
  • Loading branch information
chubei authored Dec 1, 2023
2 parents c263b1e + 5fcce6a commit 0927639
Showing 1 changed file with 168 additions and 9 deletions.
177 changes: 168 additions & 9 deletions docs/sources/grpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,189 @@ description: Allows the programmatic ingestion of events using the streaming gRP
# gRPC Ingestion

The gRPC connector allows to ingest data on Dozer pushing data to a gRPC endpoint in a streaming fashion.
It supports Apache Arrow data format.
It supports Dozer data format and Apache Arrow data format.
The Dozer ecosystem provides Python and React clients to easily ingest data on Dozer using gRPC.

## Configuration
## Default Adapter
### Configuration

To use gRPC connector the config parameter of the connection must be set to `!Grpc`.
The following configuration block can be used in `dozer-config.yaml` to define a new gRPC connection:

```yaml
connections:
- config: !Grpc
adapter: arrow
adapter: default
schemas: !Path "trips.json"
port: 7005
name: trips_arrow
name: trips_default
```
### Parameters
| Name | Type | Description |
|--------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
| `adapter` | String | The adapter to use to ingest data. |
| `schemas` | String | The path to the JSON file containing the schema definitions of the tables to ingest. |
| `port` | Boolean | The port on which the gRPC server will listen for incoming data. |

### Notes

The Dozer schema definition is using JSON format, here is an example:

```json
{
"trips": {
"schema": {
"fields": [
{
"typ": "String",
"name": "hvfhs_license_num",
"nullable": true
},
{
"typ": "String",
"name": "dispatching_base_num",
"nullable": true
},
{
"typ": "String",
"name": "originating_base_num",
"nullable": true
},
{
"typ": "Timestamp",
"name": "request_datetime",
"nullable": true
},
{
"typ": "Timestamp",
"name": "on_scene_datetime",
"nullable": true
},
{
"typ": "Timestamp",
"name": "pickup_datetime",
"nullable": true
},
{
"typ": "Timestamp",
"name": "dropoff_datetime",
"nullable": true
},
{
"typ": "Int",
"name": "PULocationID",
"nullable": true
},
{
"typ": "Int",
"name": "DOLocationID",
"nullable": true
},
{
"typ": "Float",
"name": "trip_miles",
"nullable": true
},
{
"typ": "Int",
"name": "trip_time",
"nullable": true
},
{
"typ": "Float",
"name": "base_passenger_fare",
"nullable": true
},
{
"typ": "Float",
"name": "tolls",
"nullable": true
},
{
"typ": "Float",
"name": "bcf",
"nullable": true
},
{
"typ": "Float",
"name": "sales_tax",
"nullable": true
},
{
"typ": "Float",
"name": "congestion_surcharge",
"nullable": true
},
{
"typ": "Float",
"name": "airport_fee",
"nullable": true
},
{
"typ": "Float",
"name": "tips",
"nullable": true
},
{
"typ": "Float",
"name": "driver_pay",
"nullable": true
},
{
"typ": "String",
"name": "shared_request_flag",
"nullable": true
},
{
"typ": "String",
"name": "shared_match_flag",
"nullable": true
},
{
"typ": "String",
"name": "access_a_ride_flag",
"nullable": true
},
{
"typ": "String",
"name": "wav_request_flag",
"nullable": true
},
{
"typ": "String",
"name": "wav_match_flag",
"nullable": true
}
]
}
}
}
```

## Parameters

* **adapter**: the adapter to use to ingest data. Currently, only `arrow` is supported.
* **schemas**: the path to the JSON file containing the schema definitions of the tables to ingest.
* **port**: the port on which the gRPC server will listen for incoming data.
## Arrow Adapter
### Configuration

To use gRPC connector the config parameter of the connection must be set to `!Grpc`.
The following configuration block can be used in `dozer-config.yaml` to define a new gRPC connection:

```yaml
connections:
- config: !Grpc
adapter: arrow
schemas: !Path "trips.json"
port: 7005
name: trips_arrow
```

## Notes
### Parameters
| Name | Type | Description |
|--------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
| `adapter` | String | The adapter to use to ingest data. |
| `schemas` | String | The path to the JSON file containing the schema definitions of the tables to ingest. |
| `port` | Boolean | The port on which the gRPC server will listen for incoming data. |
### Notes

The Arrow schema definition is using JSON format, here is an example:

Expand Down

0 comments on commit 0927639

Please sign in to comment.