-
Notifications
You must be signed in to change notification settings - Fork 335
SPL:taking the record of maximum values maximum values in group
Sometimes, we don’t care about what the exact maximum value is, we are rather concerned about the record in which the maximum value is located. For example, look up the student ID with the highest math score, select the order with the highest monthly sales, and so on.
To get the record where the maximum value is, we need to find out which row the maximum value is in (possibly multiple rows if there are multiple maxima), and then return to the corresponding record. SPL provides the A.maxp() function which is used to select the record where the maximum value is located.
[e.g. 1] According to the report card, find the ID of the student with the highest math score in Class One. Some of the figures are as follows:
CLASS | STUDENTID | SUBJECT | SCORE |
---|---|---|---|
1 | 1 | English | 84 |
1 | 1 | Math | 77 |
1 | 1 | PE | 69 |
1 | 2 | English | 81 |
1 | 2 | Math | 80 |
… | … | … | … |
SPL script is as follows:
A | |
---|---|
1 | =T("Scores.csv").select(SUBJECT=="Math"&&CLASS==1) |
2 | =A1.maxp(SCORE).STUDENTID |
A1: import the score table and query the math grades of Class One.
A2: employ A.maxp() function to select the record with the highest score and extract the student ID from the record.
The record where the maximum value is located is not necessarily exclusive. If we want to get all the records with the highest scores, the option @a of A.maxp() function could be used:
A | |
---|---|
2 | =A1.maxp@a(SCORE).(STUDENTID) |
A2: use the option @a of A.maxp() function to select all the records with the highest scores, and then extract all the student IDs.
The execution results of A1.maxp@a (SCORE) are as follows:
CLASS | STUDENTID | SUBJECT | SCORE |
---|---|---|---|
1 | 10 | Math | 97 |
1 | 13 | Math | 97 |
We can see that there are two students in Class One with the highest scores in math. Their student numbers are 10 and 13 respectively.
Similarly, SPL also provides the A.minp() function for taking the record where the minimum value is located. If we want to get all the records where the minimum value are, the option @a is also supported in A.minp() function.
In general, people are more interested in aggregated values after grouping, therefore, grouping operations are often accompanied by further aggregation of subsets. By aggregating groups together, we can easily calculate the maximum or minimum values for each group. However, there are times when we are more interested in these grouped subsets than in aggregated values. For example, we want to know some information like the customers’ names, sales personnel, and so on, regardless of the exact number of highest sales.
SPL provides the group() function that we can use to implement authentic grouping, and the result set after grouping is a set of many grouped subsets. This allows the group function to support the operation of taking the record where the maximum value is in each grouped subsets. Let’s look at a simple example of how to get the record where the maximum value is in group.
[e.g. 2] According to the sales table, find the record of the highest monthly sales in 2014. Some of the figures are as follows:
ID | CUSTOMERID | ORDERDATE | SELLERID | AMOUNT |
---|---|---|---|---|
10400 | EASTC | 2014/01/01 | 1 | 3063.0 |
10401 | HANAR | 2014/01/01 | 1 | 3868.6 |
10402 | ERNSH | 2014/01/02 | 8 | 2713.5 |
10403 | ERNSH | 2014/01/03 | 4 | 1005.9 |
10404 | MAGAA | 2014/01/03 | 2 | 1675.0 |
… | … | … | … | … |
We firstly select the sales records of 2014, group them by month, then take out the records with the highest sales from the sales records for each month, and finally merge those records together.
The SPL script looks like this:
A | |
---|---|
1 | =T("Sales.csv").select(year(ORDERDATE)==2014) |
2 | =A1.group(month(ORDERDATE);~.maxp@a(AMOUNT):MAX_AMOUNT) |
3 | =A2.conj(MAX_AMOUNT) |
A1: import the sales table and select the sales figures of 2014.
A2: employ the A.group() function to group the sales records by month and select the records with the highest sales in each grouped subsets.
A3: concatenate the maximum records of all month.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code