Skip to content

SPL:The Sharpest Tool for Handling Post SQL Computations

esProcSPL edited this page Feb 21, 2024 · 1 revision

During contemporary application development, SQL is generally used to achieve simple data loading and saving while most part of the computing procedure and business logic is implemented in the application. This is because:

  • Complicated SQL statements are hard to write, read, debug and modify;

  • There are different SQL dialects, and heavy use of SQL can lead to hard to migrate code;

  • The framework requires that business logic be achieved in the application rather than in the database in order to maintain loose coupling;

  • SQL is not good at handling certain computations, including complex set-oriented computations, order-based computations, join operations, and multi-step computations, and these computations need to be moved out of the database;

  • Stored procedures have side effects – hard-to-migrate code, tight coupling, and data security problem – when they are used to implement procedural control;
    Besides, it is hard to use SQL to deal with computing scenarios involving multiple databases and or non-database sources. The computations can only be handled outside databases.

In view of this, it is important to implement post-SQL computations in the application.

Generally, SQL returns structured data. So, a good post-SQL computation technology should have a convenient structured data object for further computing and processing data returned by SQL, offer enough library functions and have computing ability at least as good as SQL, and ideally, support loop and branching statements to achieve procedural control. Additionally, the technology needs to be integration-friendly since it is designed to work in an application.

Important as Java is in program development, the language’s JDK offers too basic methods. They can achieve post-SQL computations but the efficiency is too low.

ORM is a common technique used in Java to accomplish post-SQL computations. But all popular ORM products lack specialized structured data object and do not support dynamic data structure. Though they can make use of Java to achieve procedural control, achieving computations flexibly is still hard. They, with the very limited number of computing functions, offer computing abilities not nearly as good as SQL ability, and as a result, often need to resort to hardcoding in Java.

Stream’s chain method, more object-oriented than ORM HQL and backed by Lambda syntax, is often used to process data retrieved by SQL. Some ORM products can generate Stream object directly. Stream does not have specialized structured data object and support dynamic data structure either. It also has weak computing ability, even weaker than ORM, as it needs a lot of hardcoding even for basic computations like sorting, grouping & aggregation and joins.

The JVM-based Kotlin can be considered the upgraded version of Stream with improvements on chain method and Lambda syntax, and can also be used for handling post-SQL computations. But based on a compiled language, it could only make minor adjustments and inherited all major demerits of its precursor.

Python Pandas has strong structured data computing ability and is sometimes used to deal with post-SQL computations. Yet, it is hard to be integrated into a Java program due to the lack of an easy-to-use interface and is seldom used in formal projects.

We need a more powerful, more efficient and easier-to-use technology to approach post-SQL computations. esProc SPL is what we wish for.

Specialized structured data object

SPL is an open-source structured data computing engine under JVM, with specialized built-in structured data object – table sequence. It supports convenient switch between a table sequence and a database table/record and dynamic data structure, and offers flexible and easy to use access methods, maintenance methods and computing functions. Table sequences are sufficiently competent that they give solid low-level support for data computing and procedural control, making it convenient to achieve various business logics in post-SQL computations.

Direct SPL-database exchange methods enable easy conversion between database table (SQL result set) and SPL table sequence.

To execute SQL using query function and generate a table sequence containing one single record, for instance:


A B
1 =connect("orcl") //Connect to database
2 .query("select * from sales where orderid=?",201)=r=A1 //Query a single record
3 =db.close() //Close database connection

The query function will automatically generate a table sequence made up of multiple records when SQL returns multiple records:

=T=A1.query("select * from salesR where SellerID=?",10)

It’s also simple to perform the operation the other way around. SPL has update function for storing records of a table sequence into the database permanently. Suppose the original table sequence is T and table sequence NT is the one after it undergoes addition, deletion and modification on records. To write the updated table sequence to the database:

=A1.update(NT:T,sales;ORDERID)

Flexible table sequence access methods enable convenient access to a table sequence by field name or record number.

Get the 3rdrecord of a table sequence: T(3)

Get the last three records backwards: T.m([-1,-2,-3])

Calculate a field value in a specified record: T(3).Amount*0.05

Get a column and return it as a set: T.(Amount)

Get multiple columns and return them as a set of sets: T.([CLIENT,AMOUNT])

Get a value first by field and then by record number: T.(AMOUNT)(3)

Get a value first by record number and then by field: T(3).AMOUNT

Easy to use table sequence maintenance methods enable to add, delete and modify one or more records in a uniform way.

Append a record: T.insert(200,"APPL",10,2400.4,date("2010-10-10"))

Modify a record: T(3).Amount = T(3). Amount*1.05

Delete records: T.delete(T.select(ORDERID==209 || ORDERID==208))

Rich computing functions on table sequences enable all-around SQL-style computations.

Filter: T.select(Amount>1000 && Amount<=3000 && like(Client,"*bro*"))

Sort: T.sort(-Client,Amount)

Distinct: T.id(Client)

Aggregate: T.max(Amount)

Filter after groping & aggregation:T.groups(year(OrderDate),Client; avg(Amount):amt).select(amt>2000)

Join: join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))

Intersection: T1.id(Client) ^ T2.id(Client)

Get TopN:T.top(-3;Amount)

Get TopN in each group: T.groups(Client;top(3,Amount))

Dynamic data structure enables to infer the new data structure according to the current result and automatically generate a new table sequence, which can be computed directly. To first group and summarize data, and then filter, and finally sort data, for instance:

T.groups(SellerId, Client; sum(Amount):amt, count(1):cnt).select(amt>10000 && amt<=30000 && like(Client,"*bro*")).sort(amt)

Table sequences supporting dynamic data structure let programmers focus on computation itself by saving them from the trouble of defining result set beforehand. The coding style is concise, easy to understand, and more aligned with the natural way of reasoning. It helps to considerably increase development efficiency. The dynamic data structure has more prominent advantages particularly for expressing complex multi-step business logics.

Powerful structured data computing ability

SPL offers many convenient and easy-to-use syntax and powerful built-in functions that can simplify complex order-based computations, set-oriented computations, distributed computing and join operations. It can achieve many computations both SQL and stored procedures find hard to handle.

SPL offers convenient to use syntax like function option, which allows functions with similar functionalities to use same name and distinguishes them with different options, and hierarchical parameters. The basic functionality of select function is to filter data. If we need to get the first eligible record, we use @1 option:

T.select@1(Amount>1000)

When the size of data to be filtered is large, we use @m option to implement parallel processing:

T.select@m(Amount>1000)

We use @b option to perform a quick filtering using the binary search when the original data is ordered:

T.select@b(Amount>1000)

We also use @b option to perform order-based grouping, which, based on data ordered by the grouping field, puts neighboring records having same value under a specific field into the same group:

T.groups@b(Client;sum(Amount))

Function options can work together:

Orders.select@1b(Amount>1000)

Usually, parameters in a structured computation function are complicated. SQL, for instance, uses a lot of keywords to divide parameters of a statement into multiple groups, causing inconsistent statement structure. SPL has hierarchical parameters. It employs semicolon, comma and colon to identify parameters to three levels, and thus writes complicated parameters in a simple way. For example:

join(Orders:o,SellerId ; Employees:e,EId)

SPL supplies a large number of built-in date functions and string functions , far more than each of other technologies, including SQL, provides and much more powerful than them. This enables SPL to implement same computations with significantly shorter code. For example:

Time functions:

Get the date before or after a specified date: elapse("2020-02-27",5) //Return 2020-03-03

Find the day of the week of a specified date: day@w("2020-02-27") //Return 5, which is Thursday

Get the date N workdays after: workday(date("2022-01-01"),25) //Return 2022-02-04

String functions:

Check whether a string all consists of letters: isdigit("12345") //Return true

Get a string before a specified substring: substr@l("abCDcdef","cd") //Return abCD

Split a string into an array of substrings by vertical bar: "aa|bb|cc".split("|") //Return ["aa","bb","cc"]

SPL also offers functions to get a date before or after a number of years, get the ordinal number of a date in the year, get which quarter the date belongs to, split a string according to a regular expression, get the where or select part of a SQL statement, get words from a string, split HTML by the specific marker, etc.

SPL is good at simplifying the traditional complex code for order-based computations . The inter-row computations are usually not easy, like the calculations of link relative ratio and YOY. SPL uses "field[relative position]" to reference another row or a value in it. This produces concise code. It automatically handles special cases, such as out-of-range values in an array, and thus is more convenient than SQL. To append a computed column rate containing amount growth rate for each order, for instance:

=T.derive(AMOUNT/AMOUNT[-1]-1: rate)

Through the cooperation of position-oriented expressions and order-oriented functions, SPL can accomplish order-based computations that SQL finds hard to handle. To find students who attend school for at least 7 hours each day in the continuous 4 weeks according to the attendance table:


A
1 =connect("mysql")
2 =A1.query@x("SELECT SID,ATTDATE,DURATION,null AS W FROM STUTEST WHERE DURATION>=7 ORDER BY SID,ATTDATE").run(W=pdate@w(ATTDATE))
3 =A2.group@o(SID;~.groups@o(W;count(~):CNT).select(CNT==7).group@i(W-W[-1]!=7).max(~.len()):weeks)
4 =A3.select(weeks>=4).(SID)

SPL streamlines conventionally complex code for set-oriented computations . Having table sequences founded on thorough set-orientation, agile syntax and powerful set-oriented functions, SPL can express complex logics of set-oriented computations with unprecedentedly simple code. To find employees who are below the average age of their department for each department, for instance:


A
1 …//Skip the computing process for generating table sequence Employees
2 =Employees.group(DEPT; (a=~.avg(age(BIRTHDAY)),~.select(age(BIRTHDAY)<a)):YOUNG)
3 =A2.conj(YOUNG)

To count the longest days when a stock rises continuously:


A
1 …//Skip the computing process for generating table sequence AAPL
2 =a=0,AAPL.max(a=if(price>price[-1],a+1,0))

It is convenient to implement stepwise coding in SPL . With table sequences founded on thorough set-orientation, SPL can use a variable to represent a table sequence. This makes it fit for implementing stepwise programming. It is really easy to achieve many set-oriented computations that SQL finds it hard to handle. To find n big customers whose order amounts to at least half of the total and sort the corresponding records by amount in descending order, for instance:


A B
1 //Skip data retrieval process
2 =A1.sort(amount:-1) /Sort records by order amount in descending order (can be done in SQL)
3 =A2.cumulate(amount) /Get a sequence of cumulative amounts
4 =A3.m(-1)/2 /Calculate the final cumulative amount, i.e., the total
5 =A3.pselect(~>=A4) /Get position where the amount reaches half of the total
6 =A2(to(A5)) /Get records by position

SPL streamlines traditionally complex code for associative computations . The table sequence’s professionalism is shown in many aspects. One is the support of using object reference to represent association. Developers can increase their efficiency by intuitively accessing the associated table through a dot. It is easy for SPL to express associative computations SQL finds it hard to phrase on many occasions. To find male employees whose managers are females according to the employee table, for instance:

=employees.select(gender:"male",dept.manager.gender:"female")

Flexible procedural control ability

SPL offers flexible and easy to use branching statements and loop statements. By working with SPL specialized structured data object, they help to implement various business logics conveniently.

SPL branching statements :


A B
2
3 if T.AMOUNT>10000 =T.BONUS=T.AMOUNT*0.05
4 else if T.AMOUNT>=5000 && T.AMOUNT<10000 =T.BONUS=T.AMOUNT*0.03
5 else if T.AMOUNT>=2000 && T.AMOUNT<5000 =T.BONUS=T.AMOUNT*0.02

SPL loop statements :


A B
1 =db=connect("db")
2 =T=db.query@x("select * from sales where SellerID=? order by OrderDate",9)
3 for T =A3.BONUS=A3.BONUS+A3.AMOUNT*0.01
4
=A3.CLIENT=CONCAT(LEFT(A3.CLIENT,4), "co.,ltd.")
5
 …

Similar to loop in Java, SPL also uses the break keyword to exit (terminate) the current loop body and the next keyword to skip (ignore) the current round of loop.

The procedural control statement working in coordination with the table sequence enable SPL to achieve business logics, including database read/write, transaction handling, procedural control and data computation, in a consistent and uniform syntax style . To calculate bonus according to the specified rule, for instance:


A B C
1 =db=connect@e("dbName")
/Connect to database and begin transaction
2 =db.query@1("select sum(Amount) from sales where
sellerID=? and year(OrderDate)=? and month(OrderDate)=?",
p_SellerID,year(now()),month(now()))

/Query sales amount of the current month
3 =if(A2>=10000 :200, A2<10000 && A2>=2000 :100, 0)
/ The accumulative bonus of the current month
4 =p_Amount*0.05
/Fixed bonus for the current order
5 =BONUS=A3+A4
/Total bonus
6 =create(ORDERID,CLIENT,SELLERID,AMOUNT,BONUS,ORDERDATE)
/Create data structure of the orders table
7 =A6.record([p_OrderID,p_Client,p_SellerID,p_Amount,BONUS,
date(now())])

/Generate an orders record
8 >db.update@ik(A7,sales;ORDERID)
/Try to update the orders record to a database table
9 =db.error()
/Database write result
10 if A9==0 >A1.commit() /Commit the transaction if the write action succeeds
11 else >A1.rollback() /Rollback the transaction if the write action fails
12 >db.close()
/Close database connection
13 return A9
/Return database write result

Optimal framework

SPL offers JDBC driver, and allows placing code outside of the Java program, which reduces system coupling. The interpreted execution SPL supports outside-database computations, code migration and cross-database/-data-source computations. All these features enable it to achieve the optimal framework for pos-SQL computations.

SPL has easy to use JDBC driver and can be integrated into Java seamlessly. To save the previous SPL code as a script file and invoke the file name in the way of invoking a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call writeBonus()");

By storing computing code separately from the Java program and invoking it by the file name, SPL becomes both database- and Java-independent. Separating the business logic from the front-end application helps to create loosely coupling framework.

Interpreted execution and hot-swap . Business logics are various and volatile. An ideal system framework should be able to handle changeable business logics. The essence of ORM is Java code and ORM code needs to be compiled before execution. You need downtime when any new changes are getting deployed. The procedure for handling dynamic business logics is complicated. SPL is Java-based interpreted language that executes in real-time without recompilation. Any changes of the script take effect instantly with no downtime. The hot-swappable feature makes SPL particularly suitable for dealing with volatile business logics.

Ease of migration . SPL retrieves data from the database according to the source name. To migrate code from one data source to another, you just need to change the data source configurations in SPL’s configuration file rather than modifying the code. SPL supports dynamic data source, which can switch from one to another through a parameter or macro, to achieve ease of migration. To make the migration more convenient, SPL also provides standard SQL syntax independent of specific database products. You can use sqltranslate function to translate the standard SQL into a mainstream SQL dialect and then use query function to execute.

Ease of administration and operation . As SPL supports outside-database computations, SPL code can be managed by a third-party tool, which facilitates teamwork. SPL scripts can be stored in different file directories, which is convenient and flexible and helps to create cost-effective management. SPL requires database privileges similar to Java, which won’t affect data security.

Support of cross-database/data-source computations . SPL can retrieve data from any data source (including RDB) as a table sequence and compute it in uniform syntax. This enables it to achieve cross-database computations conveniently. To perform an inner join between data in MySQL and data in Oracle, for instance:


A
1 =mysql1.query("select SellerId, sum(Amount) subtotal from Orders group by SellerId")
2 =orcl.query("select EId,Name from employees")
3 =join(A1:O,SellerId; A2:E,EId).new(O.Name:name, O.Dept:dept, E.subtotal:amt)

SPL supports many types of non-RDB sources, including files like txt, csv and xls, NoSQL like MongoDB, Hadoop,Redis,ElasticSearch,Kafkaand Cassandra, and hierarchical data such as WebService, XML and Restful JSON, and is able to handle mixed computations between RDB and non-RDB.

Both ORM products and Stream/Kotlin lack specialized structured data objects and enough computing ability. Python Pandas is hard to be integrated into a Java application. SPL offers built-in specialized structured data object, remarkable structured data computing capability and flexible procedure control ability for accomplishing various business logics conveniently and efficiently. SPL provides easy to use JDBC driver to be conveniently integrated in Java programs. It also has framework advantages, including loose coupling, interpreted execution and hot swap, support of outside-database computations and cross-database/-data-source mixed computations.

Clone this wiki locally