-
Notifications
You must be signed in to change notification settings - Fork 332
Computing engine on WEB
The data interfaces on the Web are mainly restful and WebService, and the data formats usually are hierarchical JSON and XML. Hierarchical data can carry more general and richer information, yet it is more complex in structure and more difficult to compute than traditional two-dimensional data. There seem to be many tools or engines available for Web computing, but each has its own shortcomings. Specifically, the libraries like JsonPath/XPath are good at parsing but not good at computing; Python Pandas is relatively powerful in computing, but it is difficult to integrate it into Java and, since its data object DataFrame is not specifically designed for hierarchical data, it will result in a difficulty to code when the calculation task is complex; Scala Spark is better in integration but heavy in framework and difficult to learn.
For computing engines on Web, there is a better choice: esProc SPL.
esProc SPL, as an JVM-based open-source programming language, offers built-in specialized hierarchical data objects and convenient Web parsing/generating functions that can simplify complex hierarchical data calculation, and it is easy to integrate SPL into Java applications.
SPL offers table sequence – the built-in specialized hierarchical data object, which is suitable for carrying Web data formats and good at expressing complex hierarchical relationships.
For example, perform conditional query on a Json string, and return the result in the form of Json string:
A | B | |
---|---|---|
1 | =json(p_JsonStr) | Parse Jason string as SPL table sequence |
2 | =A1.conj(Orders) | Merge lower-level records |
3 | =A2.select(Amount>1000 && Amount<=2000) | Conditional query |
4 | =json(A3) | Convert result to Json string |
A1: the external parameter p_JsonStr is a Json string, and the function json converts it to a table sequence (SPL’s structured data object). In SPL’s IDE, click cell A1, we can see a hierarchical table sequence structure, where the fields like Eid and State store the data of simple types, and Orders field stores the set of records (two-dimensional table). Click a row under Orders, we can expand the value to view details:
A2, A3: calculate the table sequence, and the calculation result is also a table sequence. A4: the function json is capable of either converting a Json string to table sequences or vice-versa.
SPL provides convenient hierarchical data access methods, allowing us to locate the data on different level through dot, and the data at different position through index.
Get a set of values of a single field on level 1: A1.(Client)
Get a set of values of multiple fields on level 1: A1.(\[Client,Name\])
Get the 10th record on level 1: A1(10)
Get Orders field of the 10th record on level 1 (all lower-level records): A1(10).Orders
Get a set of values of a single field under Orders field of the 10th record on level 1: (A1(10).Orders).(Amount)
Get the 5th record of Orders field of the 10th record on level 1: (A1(10).Orders)(5)
Get 10th to 20th records on level 1: A1(to(10,20))
Get the last three records on level 1: A1.m(\[-1,-2,-3\])
As a high-level data object independent of string format, the table sequence can parse both Json and XML strings, and the calculation code is universal. For example, p_XMLStr is a hierarchical XML string (having same structure as the previous Json), when we perform the same conditional query on it, we only need to modify A1 to: =xml(p_XMLStr,"xml/row")
.
Similar to function json, function xml also supports bi-directional conversion.
SPL provides various Web data interfaces, allowing us to access restful or WebService conveniently, and parse them as uniform table sequence. For example, when we want to access restful interface, we only need to modify A1 in the previous example to:
A | |
---|---|
1 | =httpfile("http://127.0.0.1:6868/restful/emp_orders").read()) |
2 | =json(A1) |
The function httpfile is used to access URL address. With this function, we can specify the port, Header, character set, IPV4/V6, Post/Get. In addition, it supports authentication methods using cookies or tokens.
Access WebService interface:
A | |
---|---|
1 | =ws_client("http://127.0.0.1:6868/ws/RQWebService.asmx?wsdl") |
2 | =ws_call(A1,"RQWebService":"RQWebServiceSoap":"getEmp_orders") |
The function ws_client is used to establish WebService client, and the function ws_call is used to query WebService service.
Since the table sequence has nothing to do with the string format and data source, the following examples are generally applicable to Json\XML\parameter string\Web, unless otherwise specified. It is not difficult to imagine that SPL can also parse local Json files or XML files. Since two-dimensional data can be regarded as simplified hierarchical data, the table sequence can also parse and calculate csv file or database table (this is not the point of this article).
For the parsed table sequence, SPL offers rich computing functions, making it possible to easily accomplish everyday SQL-style calculations.
A | B | |
---|---|---|
1 | … | Generate hierarchical table sequence |
2 | =A1.conj(Orders) | Merge lower-level records |
3 | =A2.groups(State,Gender;avg(Salary),count(1)) | Group and aggregate multiple fields |
4 | =A1.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate) | Associate |
5 | =A1.sort(Salary) | Sort |
6 | =A1.id(State) | Distinct |
7 | =A2.top(-3;Amount) | topN |
8 | =A2.groups(Client;top(3,Amount)) | In-group TopN (window function) |
In addition, SPL provides the syntax that conforms to SQL92 standard, and supports the set calculation, case when, with, nested subqueries, etc.
SPL is highly expressive and suitable for computing hierarchical data with complex structure. For example: restful returns hierarchical Json data containing multiple subdocuments, and the structure is complex. Some of the data are as follows:
[
{
"race": {
"raceId":"1.33.1141109.2",
"meetingId":"1.33.1141109"
},
...
"numberOfRunners": 2,
"runners": [
{ "horseId":"1.00387464",
"trainer": {
"trainerId":"1.00034060"
},
"ownerColours":"Maroon,pink,dark blue."
},
{ "horseId":"1.00373620",
"trainer": {
"trainerId":"1.00010997"
},
"ownerColours":"Black,Maroon,green,pink."
}
]
},
...
]
Now we want to group and aggregate the data on different levels (group by trainerId and count the number of ownerColours members in each group). It is hard to handle the computation if we use common way directly in Java, SPL is much simpler:
A | |
---|---|
1 | … |
2 | =A1(1).runners |
3 | =A2.groups(trainer.trainerId; ownerColours.array().count():times) |
SPL supports logically complex calculations such as stepwise calculation, ordered calculation and calculation after grouping. Many calculations that are difficult to handle in SQL/stored procedures can be easily solved in SPL. For example, find out the top n customers whose cumulative sales account for half of the total sales, and sort them by sales in descending order:
A | B | |
---|---|---|
1 | … | /Fetch data |
2 | =A1.sort(amount:-1) | /Sort by sales in descending order |
3 | =A2.cumulate(amount) | /Get a sequence of cumulative amounts |
4 | =A3.m(-1)/2 | /Calculate the final accumulative amount, i.e., the total |
5 | =A3.pselect(~>=A4) | /Find the position where the amount exceeds half of the total |
6 | =A2(to(A5)) | /Get target values by position |
SPL provides a wealth of date and string functions, making it possible to effectively simplify related calculations.
Get the date before or after a specified number of quarters: elapse@q("2020-02-27",-3) // Return 2019-05-27
Get the date after N workdays: workday(date("2022-01-01"),25) // Return 2022-02-04
String functions: check if a string consists entirely of numbers: isdigit("12345") // Return true
Get a string before a specified substring: substr@l("abCDcdef","cd") // Return abCD
Split a string into an array of substrings by vertical bar: "aa|bb|cc".split("|") // Return ["aa","bb","cc"]
SPL also offers many other functions, for example, get a date before or after specified number of years, get which quarter the date belongs to, split a string according to a regular expression, get words from a string, split HTML by the specific marker, etc.
It is worth mentioning that in order to further improve development efficiency, SPL invents unique function syntax. For example, we can use option to distinguish similar functions. If we only want to filter out the first record that meets condition, we can use the option @1:
T.select@1(Amount>1000)
If we want to search for the first record from back to front, we can use the option @z:
T.select@z1(Amount>1000)
SPL provides JDBC driver, through which it can be easily integrated into Java code: Simple SPL code can be embedded directly in Java like SQL:
String jsonStr=… //Json string
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String query="=json(json($["+jsonStr+"]).conj(Orders).select((Amount>1000 && Amount<=2000)))";
ResultSet result = statement.executeQuery(query);
For complex SPL code, we can save it as a script file first, and then invoke it by a Java program in the way of invoking a stored procedure, which can effectively reduce the coupling between computing code and front-end application.
Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
CallableStatement statement = conn.prepareCall("{call getRestful(?, ?)}");
statement.setObject(1, "2020-01-01");
statement.setObject(2, "2020-01-31");
statement.execute();
SPL is an interpreted language, and the outside-application SPL code can be executed without compiling. In addition, SPL supports non-stop hot deployment, this makes it suitable for the changing business logic, and makes O&M complexity low.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code