Skip to content

Just How Slow is Data Retrieval via JDBC

esProcSPL edited this page May 28, 2024 · 1 revision

Slow database JDBC

A JAVA program retrieves data from the database through JDBC. The retrieval speed is slow even sometimes the database load is not heavy at all and the SQL statement is simple.

Now let’s do a speed testing (take Oracle as the example) to better understand this.

Source data

Data is generated using the TPC-H data generator. We use the customer table to perform the test. The table has 1500,0000 rows and 8 fields. The original text file name is customer.tbl and has a size of 2.3G. We are trying to load data from the text file to a data table in Oracle using the data loading tool provided by the database.

Test environment

Two Intel3014 processors of 1.7 G. Each CPU has 6 cores.

SSD: 1T, 561MB/s read and 523MB/s write

Interface: SATA 6.0Gb/s

RAM: 64G

OS: Linux CentOS 7

The whole test is performed locally on the server without any network transmission.

Database retrieval test

We are trying to carry out the data retrieval using the SQL statement through Oracle JDBC.

It is complicated to write the retrieval in Java. Instead, we do the test using a SPL (Strucrtured Process Language) script:

A B
1 =now() / Record the current time
2 =connect("oracle") / Connect to the oracle database
3 =A2.cursor("SELECT * FROM CUSTOMER") / Create a cursor from which data will be fetched
4 for A3,10000 / Retrieve data circularly, 10000 records each time
5 =A2.close() / Close database connection
6 =interval@s(A1,now()) / Calculate the time spent in data retrieval

Test result: 275 seconds

Test on a text file

To let you have more perceptual understanding about this, we will take the text file to do the test as we did in above. We read data from the text file and parse records without performing any specific computations.

Write the following test SPL script:

A B
1 =now() / Record the current time
2 =file("customer.tbl") / Create a file object
3 =A2.cursor(;," ")
4 for A3,10000 / Retrieve data circularly, 10000 records each time
5 =interval@s(A1,now()) / Calculate the time spent in data retrieval

Test result: 43 seconds

Reading a text file is 275/43=6.4 times faster than reading data from the Oracle database.

Text parsing is complicated though, reading data from a text file is far faster than retrieving data from the database.

Speeding up data retrieval using parallel processing with esProc SPL

When data can only be retrieved from the database and the database is not heavy-loaded, the multi-CPU-based parallel processing can be used to accelerate data retrieval. It is a great hassle to write the algorithm in Java because a series of issues, including resource sharing and collision, must be considered.

SPL’s parallel processing technique can enhance the database JDBC’s retrieval performance and avoid the complex JAVA hardcoding and facilitate concatenation of the result sets returned by multiple threads.

Below are our tests.

Parallel retrieval on a single table

Here we still use the above customer table. Data is retrieved faster using the parallel processing. Below is the SPL code:

A B
1 =now() / Record the current time
2 =connect("oracle").query@ix("SELECT COUNT(*) FROM CUSTOMER")(1)
3 >n=12 / Define the number of parallel threads
4 =n.([int(A2/n)*(~-1),int(A2/n)*~]) / Divide the table into segments according to the number of parallel threads
5 fork A4 =connect("oracle")
6 =B5.query@x("SELECT * FROM CUSTOMER WHERE C_CUSTKEY>? AND C_CUSTKEY<=?",A5(1),A5(2))
7 =A5.conj() / Concatenate result sets returned by threads
8 =interval@s(A1,now()) / Calculate the time spent in data retrieval

Test result: 28 seconds (275 seconds without parallel processing)

To perform the parallel retrieval, we need to divide the source data into relatively even intervals. In this example, C_CUSTKEY field contains natural numbers beginning from 1. We first count the total records (A2) and then split them into n segments evenly (A4). A5 performs the parallel retrieval, where each thread connects to the database and execute SQL using the intervals of C_CUSTKEY values as the parameter. Finally, result sets of these multiple threads are concatenated to get the final result.

In real-world situations, it is probably that we need a certain method to set the WHERE condition to get nearly even intervals.

Parallel retrieval on multiple tables

We can also use parallel processing to make multi-SQL retrieval faster.

In this case, data is TPC-H generated and the total volume decreases to 5G. We use 5 tables in this test. Below is the SPL script without using parallel processing. The result is unsatisfactory.

A B
1 SELECT * FROM SUPPLIER
2 SELECT * FROM PART
3 SELECT * FROM CUSTOMER
4 SELECT * FROM PARTSUPP
5 SELECT * FROM ORDERS
6 =now() / Record the current time
7 =connect("oracle") /Connect to the database
8 =[A1:A5].(A7.query(~)) / Execute SQL statements one by one in turn
9 >A7.close() /Close database connection
10 =interval@ms(A6,now()) /Calculate the time spent in data retrieval

Test result: 360 seconds

While performance is considerably improved after we employ parallel retrieval. Below is the code:

A B
11 =now() / Record the current time
12 fork [A1:A5] =connect("oracle")
13 =B12.query@x(A12)
14 =interval@ms(A11,now()) /Calculate the time spent in data retrieval

Test result: 167 seconds

Though data probably cannot evenly be divided in the multi-table scenarios, parallel processing is still an effective way to increase retrieval performance significantly.

esProc SPL is an opensource software developed in Java, it provides JDBC interface and is very easy to be integrated in Java Applications.

Clone this wiki locally