forked from antonmks/Alenka
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmanual.txt
50 lines (33 loc) · 2.49 KB
/
manual.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Short introduction :
Alenka supports several data types : integers, floats, decimals and character strings.
The difference between floats and decimals is that the latter are compressed when written to a file.
Requirements : you need to have a model 400 or better of NVidia GPU.
Limitations ( hopefully temporary) :
1. if you use AVG function the Group By statement must include COUNT(FIELD) expression .
2. In LOAD statement the columns should be listed in the order they have in a source file.
Loading data for processing :
Alenka can read text files with fields separated by separators. But much faster way to read the data would be to load it into binary files as demonstrated by the following script :
file1.sql :
A := LOAD 'nation.tbl' USING ('|') AS (n_nationkey{1}:int, n_name{2}:varchar(25), n_regionkey{3}:int);
STORE A INTO 'nation' BINARY;
File2.sql
A := LOAD 'region.tbl' USING ('|') AS (r_regionkey{1}:int, r_name{2}:varchar(25));
STORE A INTO 'region' BINARY;
This script read file date.tbl with fields separated by tabs. Field d_datekey is the first field in a file, d_month - fourth field and so on.
Since alenka is a columnar database the fields will be compressed and stored in separate files with names like date.1, date.4 etc.
Compression. In Alenka compression and decompression is transparent to the user. Alenka uses FOR(frame of reference), FOR-DELTA and dictionary compression.
Decimal and integer fields are compressed by FOR and FOR-DELTA(when data are already sorted). Strings are compressed by dictionary compression.
Like all other oparations compression and decompression are done on the GPU.
Once we created the data files we can do the data processing :
RF := FILTER region BY r_name == "EUROPE";
J_N := SELECT n_nationkey AS n_nationkey, n_name AS n_name
FROM nation JOIN RF ON n_regionkey = r_regionkey;
STORE J_N INTO 'mytest.txt' USING ('|') LIMIT 100;
To compile alenka you need to have NVidia's CUDA installed. You also need to download and use the latest(1.7) version of Thrust.
oins use Sean Baxter's excellent moderngpu library(https://github.com/NVlabs/moderngpu) so you need to download and unzip it into the source directory as well.
To compile alenka run make on Linux or nmake on Windows.
Alenka have been tested on a 64bit Windows7 and 64bit Linux.
When you compiled alenka you can run it from a command line using a SQL script file as a parameter : alenka.exe q1.sql
Or you can use Alenka's JDBC driver.
Alenka is distributed under Apache 2 license.