-
Notifications
You must be signed in to change notification settings - Fork 332
SPL:Static Transposition
The transposition function is commonly presented in front-end such as reports, which transposes the queried data into a specific display layout. For example, row-to-column transposition, column-to-row transposition, and more complex dynamic transposition, etc. And static transposition means that the structure of the transposed table is determined in advance, and the target structure will not change dynamically according to the data.
As the name implies, this functionality is used to perform the transposition from row to column, that is, taking the values in the rows as the names of columns. In practice, the application of row-to-column transposition usually follows the grouping and aggregation operations. Through grouping, we process the data of rows in the column to be transposed into distinct values, and then display the values of rows as the names of the columns. In fact, the function of row-to-column transposition is to subdivide the aggregation result of one column into a number of more specific column aggregation results for a more intuitive display effect.
[e.g.1] According to the students’ grade table, query the highest grades of each subject in each class and display them by column. And some of the data are as follows:
CLASS | STUDENTID | SUBJECT | SCORE |
---|---|---|---|
1 | 1 | English | 84 |
1 | 1 | Math | 77 |
1 | 1 | PE | 69 |
1 | 2 | English | 81 |
1 | 2 | Math | 80 |
… | … | … | … |
The results are expected to be displayed in the following form:
CLASS | MAX_MATH | MAX_ENGLISH | MAX_PE |
---|---|---|---|
1 | 97 | 96 | 97 |
2 | 97 | 96 | 97 |
… | … | … | … |
The A.pivot() function is provided in SPL for transposition, and the default is row-to-column transposition.
The SPL script is:
A | |
---|---|
1 | =T("Scores.csv") |
2 | =A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE) |
3 | =A2.pivot(CLASS; SUBJECT, MAX_SCORE; "Math":"MAX_MATH", "English":"MAX_ENGLISH", "PE":"MAX_PE") |
A1: import the students’ grade table from the file.
A2: group and aggregate the records to get the highest grade of each subject in each class.
A3: use the A.pivot() function to transpose the highest grade of each subject in each class from row to column.
In the opposite of row-to-column transposition, column-to-row transposition needs to generate a new row for each field to be transposed, and the field name or corresponding name is converted to the value of the new field, and the original field value is converted to the value of another new field.
[e.g.2] Generate a list of each medal based on the Olympic medal table. And some of the data are as follows:
Game | Nation | Gold | Silver | Copper |
---|---|---|---|---|
30 | USA | 46 | 29 | 29 |
30 | China | 38 | 27 | 23 |
30 | UK | 29 | 17 | 19 |
30 | Russia | 24 | 26 | 32 |
30 | Korea | 13 | 8 | 7 |
… | … | … | … | … |
The results are expected to be displayed in the following form:
GAME | NATION | MEDAL_TYPE | MEDALS |
---|---|---|---|
30 | USA | GOLD | 46 |
30 | USA | SILVER | 29 |
30 | USA | COPPER | 29 |
30 | China | GOLD | 38 |
30 | China | SILVER | 27 |
30 | China | COPPER | 23 |
… | … | … | … |
The @r option of the A.pivot() function is used to transpose columns to rows.
The SPL script is:
A | |
---|---|
1 | =T("Olympic.txt") |
2 | =A1.pivot@r(GAME,NATION; MEDAL_TYPE, MEDALS; GOLD, SILVER, COPPER) |
A1: import the Olympic medal table.
A2: use the A.pivot@r() function to transpose fields GOLD, SILVER, COPPER from columns to rows.
Sometimes we need to use the value of a row as the column name, and convert the column name to a field value.
[e.g.3] The sales table classified by sales channel is recorded according to the year and month. And some of the data are as follows:
YEAR | MONTH | ONLINE | STORE |
---|---|---|---|
2020 | 1 | 2440 | 3746.2 |
2020 | 2 | 1863.4 | 448.0 |
2020 | 3 | 1813.0 | 624.8 |
2020 | 4 | 670.8 | 2464.8 |
2020 | 5 | 3730.0 | 724.5 |
… | … | … | … |
To query the sales amount of each channel in every month, and the results are expected to be displayed in the following form:
CATEGORY | 1 | 2 | 3 | … |
---|---|---|---|---|
ONLINE | 2440 | 1863.4 | 1813.0 | … |
STORE | 3746.2 | 448.0 | 624.8 | … |
Logically, we first use the A.pivot@r() function to perform column-to-row transposition and then use the A.pivot() function to transpose rows to columns.
The SPL script is:
A | |
---|---|
1 | =T("MonthSales.csv").select(YEAR:2020) |
2 | =A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT) |
3 | =A2.pivot(CATEGORY; MONTH, AMOUNT) |
A1: import the sales table, and select the records of year 2020.
A2: use the A.pivot@r() function to perform column-to-row transposition, converting the channel type of data to the field value of CATAGORY. The results are as follows:
YEAR | MONTH | CATEGORY | AMOUNT |
---|---|---|---|
2020 | 1 | ONLINE | 2440 |
2020 | 1 | STORE | 3746.2 |
2020 | 2 | ONLINE | 1863.4 |
2020 | 2 | STORE | 448.0 |
2020 | 3 | ONLINE | 1813.0 |
2020 | 3 | STORE | 624.8 |
… | … | … | … |
A3: use the A.pivot() function to transpose the values of the month field to columns.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code