Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ruslandoga committed Oct 18, 2024
1 parent 4b2f257 commit 1b7919f
Show file tree
Hide file tree
Showing 23 changed files with 179 additions and 51 deletions.
69 changes: 53 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,25 @@
[![Documentation badge](https://img.shields.io/badge/Documentation-ff69b4)](https://hexdocs.pm/ch)
[![Hex.pm badge](https://img.shields.io/badge/Package%20on%20hex.pm-informational)](https://hex.pm/packages/ch)

Minimal HTTP ClickHouse client for Elixir.
Minimal ClickHouse client for Elixir.

Used in [Ecto ClickHouse adapter.](https://github.com/plausible/ecto_ch)

### Key features

- RowBinary
- Native query parameters
- Per query settings
- Minimal API

Your ideas are welcome [here.](https://github.com/plausible/ch/issues/82)
- HTTP or Native
- [Multinode support](./guides/multihost.md)
- [Compression](./guides/compression.md)

## Installation

```elixir
defp deps do
[
{:ch, "~> 0.2.0"}
{:ch, "~> 0.3.0"}
]
end
```
Expand Down Expand Up @@ -60,7 +60,7 @@ Note on datetime encoding in query parameters:

- `%NaiveDateTime{}` is encoded as text to make it assume the column's or ClickHouse server's timezone
- `%DateTime{time_zone: "Etc/UTC"}` is encoded as unix timestamp and is treated as UTC timestamp by ClickHouse
- encoding non UTC `%DateTime{}` raises `ArgumentError`
- encoding non-UTC `%DateTime{}` requires a [time zone database](https://hexdocs.pm/elixir/1.17.1/DateTime.html#module-time-zone-database)

#### Insert rows

Expand Down Expand Up @@ -100,10 +100,10 @@ types = [:u64, :string]
rowbinary = Ch.RowBinary.encode_rows(rows, types)

%Ch.Result{num_rows: 2} =
Ch.query!(pid, ["INSERT INTO ch_demo(id) FORMAT RowBinary\n" | rowbinary])
Ch.query!(pid, ["INSERT INTO ch_demo(id, text) FORMAT RowBinary\n" | rowbinary])
```

Similarly, you can use [`RowBinaryWithNamesAndTypes`](https://clickhouse.com/docs/en/interfaces/formats#rowbinarywithnamesandtypes) which would additionally do something like a type check.
Similarly, you can use [`RowBinaryWithNamesAndTypes`](https://clickhouse.com/docs/en/interfaces/formats#rowbinarywithnamesandtypes) which would additionally do something not quite unlike a type check.

```elixir
sql = "INSERT INTO ch_demo FORMAT RowBinaryWithNamesAndTypes\n"
Expand All @@ -116,22 +116,38 @@ rows = [
types = ["UInt64", "String"]
names = ["id", "text"]

data = [
rowbinary_with_names_and_types = [
Ch.RowBinary.encode_names_and_types(names, types),
Ch.RowBinary.encode_rows(rows, types)
]

%Ch.Result{num_rows: 2} = Ch.query!(pid, [sql | data])
%Ch.Result{num_rows: 2} =
Ch.query!(pid, [sql | rowbinary_with_names_and_types])
```

And you can use buffer helpers too. They are available for RowBinary, RowBinaryWithNamesAndTypes, and Native formats.

```elixir
buffer = Ch.RowBinary.new_buffer(_types = ["UInt64", "String"])
buffer = Ch.RowBinary.push_buffer(buffer, [[0, "a"], [1, "b"]])
rowbinary = Ch.RowBinary.buffer_to_iodata(buffer)

%Ch.Result{num_rows: 2} =
Ch.query!(pid, ["INSERT INTO ch_demo(id, text) FORMAT RowBinary\n" | rowbinary])
```

#### Insert rows in some other [format](https://clickhouse.com/docs/en/interfaces/formats)

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")
Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64, text String) ENGINE Null")

csv = [0, 1] |> Enum.map(&to_string/1) |> Enum.intersperse(?\n)
csv =
"""
0,"a"\n
1,"b"\
"""

%Ch.Result{num_rows: 2} =
Ch.query!(pid, ["INSERT INTO ch_demo(id) FORMAT CSV\n" | csv])
Expand All @@ -156,6 +172,20 @@ end)

This query makes a [`transfer-encoding: chunked`] HTTP request while unfolding the stream resulting in lower memory usage.

#### Stream from a file

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")

DBConnection.run(pid, fn conn ->
File.stream!("buffer.tmp", _bytes = 2048)
|> Stream.into(Ch.stream(conn, "INSERT INTO ch_demo(id) FORMAT RowBinary\n"))
|> Stream.run()
end)
```

#### Query with custom [settings](https://clickhouse.com/docs/en/operations/settings/settings)

```elixir
Expand Down Expand Up @@ -265,7 +295,7 @@ Mix.install([:ch, :tz])
"2023-04-26 01:45:12+08:00 CST Asia/Taipei" = to_string(taipei)
```

Encoding non-UTC datetimes raises an `ArgumentError`
Encoding non-UTC datetimes is possible but slow.

```elixir
Ch.query!(pid, "CREATE TABLE ch_datetimes(datetime DateTime) ENGINE Null")
Expand All @@ -274,10 +304,17 @@ naive = NaiveDateTime.utc_now()
utc = DateTime.utc_now()
taipei = DateTime.shift_zone!(utc, "Asia/Taipei")

# ** (ArgumentError) non-UTC timezones are not supported for encoding: 2023-04-26 01:49:43.044569+08:00 CST Asia/Taipei
Ch.RowBinary.encode_rows([[naive], [utc], [taipei]], ["DateTime"])
rows = [
[naive],
[utc],
[taipei]
]

types = ["DateTime"]

Ch.RowBinary.encode_rows(rows, types)
```

## Benchmarks

Please see [CI Results](https://github.com/plausible/ch/actions/workflows/bench.yml) (make sure to click the latest workflow run and scroll down to "Artifacts") for [some of our benchmarks.](./bench/) :)
Please see [CI Results](https://github.com/plausible/ch/actions/workflows/bench.yml) (make sure to click the latest workflow run and scroll down to "Artifacts") for [some of our benchmarks.](./bench/)
Empty file added bench/buffer.exs
Empty file.
Empty file added bench/decode.exs
Empty file.
55 changes: 55 additions & 0 deletions bench/encode.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
IO.puts("This benchmark is based on https://github.com/ClickHouse/clickhouse-go#benchmark\n")

port = String.to_integer(System.get_env("CH_PORT") || "8123")
hostname = System.get_env("CH_HOSTNAME") || "localhost"
scheme = System.get_env("CH_SCHEME") || "http"
database = System.get_env("CH_DATABASE") || "ch_bench"

{:ok, conn} = Ch.start_link(scheme: scheme, hostname: hostname, port: port)
Ch.query!(conn, "CREATE DATABASE IF NOT EXISTS {$0:Identifier}", [database])

Ch.query!(conn, """
CREATE TABLE IF NOT EXISTS #{database}.benchmark (
col1 UInt64,
col2 String,
col3 Array(UInt8),
col4 DateTime
) Engine Null
""")

types = [Ch.Types.u64(), Ch.Types.string(), Ch.Types.array(Ch.Types.u8()), Ch.Types.datetime()]
statement = "INSERT INTO #{database}.benchmark FORMAT RowBinary"

rows = fn count ->
Enum.map(1..count, fn i ->
[i, "Golang SQL database driver", [1, 2, 3, 4, 5, 6, 7, 8, 9], NaiveDateTime.utc_now()]
end)
end

alias Ch.RowBinary

Benchee.run(
%{
# "control" => fn rows -> Enum.each(rows, fn _row -> :ok end) end,
"encode" => fn rows -> RowBinary.encode_rows(rows, types) end,
"insert" => fn rows -> Ch.query!(conn, statement, rows, types: types) end,
# "control stream" => fn rows -> rows |> Stream.chunk_every(60_000) |> Stream.run() end,
"encode stream" => fn rows ->
rows
|> Stream.chunk_every(60_000)
|> Stream.map(fn chunk -> RowBinary.encode_rows(chunk, types) end)
|> Stream.run()
end,
"insert stream" => fn rows ->
stream =
rows
|> Stream.chunk_every(60_000)
|> Stream.map(fn chunk -> RowBinary.encode_rows(chunk, types) end)

Ch.query!(conn, statement, stream, encode: false)
end
},
inputs: %{
"1_000_000 rows" => rows.(1_000_000)
}
)
Empty file added bench/http.exs
Empty file.
9 changes: 2 additions & 7 deletions bench/insert.exs
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,10 @@ alias Ch.RowBinary
Benchee.run(
%{
# "control" => fn rows -> Enum.each(rows, fn _row -> :ok end) end,
"encode" => fn rows -> RowBinary.encode_rows(rows, types) end,

"insert" => fn rows -> Ch.query!(conn, statement, rows, types: types) end,
# "control stream" => fn rows -> rows |> Stream.chunk_every(60_000) |> Stream.run() end,
"encode stream" => fn rows ->
rows
|> Stream.chunk_every(60_000)
|> Stream.map(fn chunk -> RowBinary.encode_rows(chunk, types) end)
|> Stream.run()
end,

"insert stream" => fn rows ->
stream =
rows
Expand Down
Empty file added bench/native.exs
Empty file.
Empty file added bench/types.exs
Empty file.
1 change: 1 addition & 0 deletions guides/in_memory_insert_buffer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# In-memory INSERT buffer
3 changes: 3 additions & 0 deletions guides/multinode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Connecting to multiple nodes

Similar to https://clickhouse.com/docs/en/integrations/go#connecting-to-multiple-nodes
37 changes: 37 additions & 0 deletions guides/on_disk_insert_buffer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# On-disk INSERT buffer

Here how you could do it

```elixir
defmodule WriteBuffer do
use GenServer

# 5 MB
max_buffer_size = 5_000_000

def insert(rows) do
row_binary = Ch.RowBinary.encode_many(rows, unquote(encoding_types))
GenServer.call(__MODULE__, {:buffer, row_binary})
end

def init(opts) do
{:ok, fd} = :file.open()
%{fd: fd, buffer_size: 0}
end

def handle_call({:buffer, row_binary}, _from, state) do
new_buffer_size = state.buffer_size + IO.iodata_length(row_binary)
:file.write(state.fd, row_binary)

if new_buffer_size < unquote(max_buffer_size) do
%{state | buffer_size: new_buffer_size}
else
flush(state)
end
end
end
```

See [tests](../test/ch/on_disk_buffer_test.exs) for more.

TODO: notes on using it in docker and "surviving" restarts
12 changes: 2 additions & 10 deletions lib/ch.ex
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ defmodule Ch do
| {:command, Ch.Query.command()}
| {:headers, [{String.t(), String.t()}]}
| {:format, String.t()}
| {:decode, boolean}
| DBConnection.connection_option()

@doc """
Expand All @@ -71,8 +70,7 @@ defmodule Ch do
* `:timeout` - Configures both query request timeout and HTTP receive timeout in milliseconds, whichever happens faster
* `:command` - Command tag for the query
* `:headers` - Custom HTTP headers for the request
* `:format` - Custom response format for the request
* `:decode` - Whether to automatically decode the response
* `:format` - Custom response format for the request, if provided, the response is not decoded automatically
* [`DBConnection.connection_option()`](https://hexdocs.pm/db_connection/DBConnection.html#t:connection_option/0)
"""
Expand Down Expand Up @@ -105,13 +103,7 @@ defmodule Ch do
%Ch.Stream{conn: conn, query: query, params: params, opts: opts}
end

# TODO drop
@doc false
@spec run(DBConnection.conn(), (DBConnection.t() -> any), Keyword.t()) :: any
def run(conn, f, opts \\ []) when is_function(f, 1) do
DBConnection.run(conn, f, opts)
end

# TODO need it?
if Code.ensure_loaded?(Ecto.ParameterizedType) do
@behaviour Ecto.ParameterizedType

Expand Down
2 changes: 1 addition & 1 deletion lib/ch/error.ex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
defmodule Ch.Error do
@moduledoc "Error struct wrapping ClickHouse error responses."
defexception [:code, :message]
@type t :: %__MODULE__{code: pos_integer | nil, message: String.t()}
@type t :: %__MODULE__{code: pos_integer | nil, message: binary}
end
2 changes: 2 additions & 0 deletions lib/ch/http.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
defmodule Ch.HTTP do
end
3 changes: 3 additions & 0 deletions lib/ch/native.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
defmodule Ch.Native do
@moduledoc false
end
9 changes: 3 additions & 6 deletions lib/ch/query.ex
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
defmodule Ch.Query do
@moduledoc "Query struct wrapping the SQL statement."
defstruct [:statement, :command, :encode, :decode]

@type t :: %__MODULE__{statement: iodata, command: command, encode: boolean, decode: boolean}
defstruct [:statement, :command]
@type t :: %__MODULE__{statement: iodata, command: command}

@doc false
@spec build(iodata, [Ch.query_option()]) :: t
def build(statement, opts \\ []) do
command = Keyword.get(opts, :command) || extract_command(statement)
encode = Keyword.get(opts, :encode, true)
decode = Keyword.get(opts, :decode, true)
%__MODULE__{statement: statement, command: command, encode: encode, decode: decode}
%__MODULE__{statement: statement, command: command}
end

statements = [
Expand Down
8 changes: 3 additions & 5 deletions lib/ch/result.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@ defmodule Ch.Result do
@moduledoc """
Result struct returned from any successful query. Its fields are:
* `command` - An atom of the query command, for example: `:select`, `:insert`;
* `command` - An atom of the query command, for example: `:select`, `:insert`
* `rows` - A list of lists, each inner list corresponding to a row, each element in the inner list corresponds to a column
* `num_rows` - The number of fetched or affected rows;
* `headers` - The HTTP response headers
* `num_rows` - The number of fetched or affected rows
* `data` - The raw iodata from the response
"""

Expand All @@ -14,8 +13,7 @@ defmodule Ch.Result do
@type t :: %__MODULE__{
command: Ch.Query.command(),
num_rows: non_neg_integer | nil,
rows: [[term]] | iodata | nil,
headers: Mint.Types.headers(),
rows: [[term]] | nil,
data: iodata
}
end
2 changes: 2 additions & 0 deletions lib/ch/row_binary.ex
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
defmodule Ch.RowBinary do
@moduledoc "Helpers for working with ClickHouse [`RowBinary`](https://clickhouse.com/docs/en/sql-reference/formats#rowbinary) format."

# TODO cleanup

# @compile {:bin_opt_info, true}
@dialyzer :no_improper_lists

Expand Down
3 changes: 3 additions & 0 deletions lib/ch/ssl.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
defmodule Ch.SSL do
@moduledoc false
end
1 change: 1 addition & 0 deletions lib/ch/stream.ex
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ defmodule Ch.Stream do
def slice(_), do: {:error, __MODULE__}
end

# TODO optimize
defimpl Collectable do
def into(stream) do
%Ch.Stream{conn: conn, query: query, params: params, opts: opts} = stream
Expand Down
2 changes: 2 additions & 0 deletions lib/ch/types.ex
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ defmodule Ch.Types do
Helpers to turn ClickHouse types into Elixir terms for easier processing.
"""

# TODO cleanup

types =
[
{_encoded = "String", _decoded = :string, _args = []},
Expand Down
Loading

0 comments on commit 1b7919f

Please sign in to comment.