Skip to content

The Select Operation on Structured Data

esProcSPL edited this page Aug 12, 2024 · 1 revision

A select operation gets one or more members from a set according to a specific condition. It is similar to a locate operation. But the latter is concerned with the positions of certain members while the select operation wants the detailed information of a member record. Getting the incomes of the employees whose hire durations are over 10 years and finding names of customers whose order amounts are above 10,000 are among the many examples of select operations. But to perform selections conveniently and efficiently, you’d better read this article carefully where various scenarios are listed and sample scripts in esProc SPL are provided.

1. Get members meeting the specific condition

【Example 1】 List European and African cities with a population of more than 2 million and their populations in column groups (each column group is ordered by population in descending order). Below is part of the world’s urban population table:

Continent Country City Population
Africa Egypt Cairo 6789479
Asia China Shanghai 24240000
Europe Britain London 7285000

The expected result:

Europe City Population Africa City Population
Moscow 8389200 Cairo 6789479
London 7285000 Kinshasa 5064000
St Petersburg 4694000 Alexandria 3328196

【SPL script】

A B
1 =connect("db").query("select * from World where Continent in('Europe','Africa') and Population >= 2000000") / Connect to database and get records of European and African cities having a population of over 2 million
2 =A1.select(Continent:"Europe") / A.select()function gets records of Europe
3 =A1.select(Continent:"Africa") /A.select()function gets records of Africa
4 =create('Europe City',Population,'Africa City', Population) / Create an empty table according to the target structure
5 =A4.paste(A2.(City),A2.(Population),A3.(City),A3.(Population)) / A.paste()function pastes values to corresponding columns

A4’s result:

Europe City Population Africa City Population
Moscow 8389200 Cairo 6789479
London 7285000 Kinshasa 5064000
St Petersburg 4694000 Alexandria 3328196

【Example 2】 Get full names and salaries of employees in R&D department in the New York state. Below is part of the source data:

ID NAME SURNAME STATE DEPT SALARY
1 Rebecca Moore California R&D 7000
2 Ashley Wilson New York Finance 11000
3 Rachel Johnson New Mexico Sales 9000
4 Emily Smith Texas HR 7000
5 Ashley Smith Texas R&D 16000

【SPL script】

A B
1 =connect("db") / Connect to the data source
2 =A1.query("select * from Employee") / Import Employee table
3 =A2.select(STATE=="New York"&&DEPT=="R&D") / A.select() function gets records of employees of R&D department in New York; the double ampersand sign ("&&") is used to connect two conditions that need to be met at the same time
4 =A3.new(NAME+""+SURNAME:FULLNAME, SALARY) / Use the plus sign (+) to connect the name string and surname string into a full name

A4’s result:

FULLNAME SALARY
Matthew Johnson 6000
Lauren Thomas 12000
Brooke Williams 12000

【Example 3】 Find the number of employees who are below 30 or over 50 in each department. Below is part of the source data:

ID NAME BIRTHDAY STATE DEPT SALARY
1 Rebecca 1974/11/20 California R&D 7000
2 Ashley 1980/07/19 New York Finance 11000
3 Rachel 1970/12/17 New Mexico Sales 9000
4 Emily 1985/03/07 Texas HR 7000
5 Ashley 1975/05/13 Texas R&D 16000

【SPL script】

A B
1 =connect("db") / Connect to the data source
2 =A1.query("select * from Employee") / Import Employee table
3 =A2.select((age=age(BIRTHDAY), age<30
4 =A3.groups(DEPT; count(~):Count) / Group the selected records and count the employees

A4’s result:

DEPT Count
4 9
5 5
6 3
7 6
8 1

2. Get the record(s) holding the maximum/minimum value

【Example 4】 Based on the scores table, get the student ID with the lowest math score. Below is part of the source table:

CLASS STUDENTID SUBJECT SCORE
Class one 1 English 84
Class one 1 Math 77
Class one 1 PE 69
Class one 2 English 81
Class one 2 Math 80

【SPL script】

A B
1 =connect("db") / Connect to the database
2 =A1.query("select * from Scores where SUBJECT='Math'and CLASS='Class one'") / Get math records for class one
3 =A2.minp(SCORE) /A.minp() function gets the record having the lowest score
4 =A3.STUDENTID / Get the student ID from A3’s record

There could be more than one record that contains the minimum value. To return all eligible records, you can use @a option in A.minp() function:

A B
3 =A2.minp@a(SCORE) / A.minp() function works with @a option to get all records containing the lowest score
4 =A3.(STUDENTID) / Get student IDs from A3’s records

A3’s result:

CLASS STUDENTID SUBJECT SCORE
Class one 5 Math 60
Class one 14 Math 60

A4’s result:

Member
5
14

【Example 5】 Based on the Olympics medal table, get all information of the country that holds the top place for the longest time in terms of total medals. Below is part of the source table:

Game Nation Gold Silver Copper
30 USA 46 29 29
30 China 38 27 23
30 UK 29 17 19
30 Russia 24 26 32
30 Korea 13 8 7

【SPL script】

A B
1 =file("Olympic.csv").import@cqt() / Import the Olympic medal tally file
2 =A1.sort@z(Game, 1000000*Gold+1000*Silver+Copper) / Sort records by games and the totals
3 =A2.group@o1(Game) / Get the one record for each game, which is the first record because records are already sorted in descending order
4 =A3.group@o(Nation) / Group the selected records by nations in their original order
5 =A4.maxp(~.len()) / A.maxp() selects the group having the most records, which are the country that holds the top place in a row for the longest time

A5’s result:

Game Nation Gold Silver Copper
10 USA 41 32 30
9 USA 22 18 16
8 USA 45 27 27
7 USA 41 27 28

3. Get data by ranges

Sometimes we need to first find the sequence number of the range where a value belongs to and then get its corresponding members in a set. Examples include returning the assessment (Excellent, Good, Average, Failed) according to a student’s score, and finding a household’s income range (Below or near poverty level, Low income, middle class, Upper middle class) according to its annually income.

【Example 6】 Based on the score table, find the number of students in Excellent, Pass, and Failed ranges respectively for the English subject. Below is part of the source data:

CLASS STUDENTID SUBJECT SCORE
Class one 1 English 84
Class one 1 Math 77
Class one 1 PE 69
Class one 2 English 81
Class one 2 Math 80

【SPL script】

A B
1 =connect("db").query("select * from Scores where SUBJECT='English'") / Connect to the database and get records of English subject
2 =create(Assessment,Score).record(["fail",0,"pass",60,"excellent",90]) / Create a lookup table of score ranges and assessments
3 =A1.derive(A2.segp(Score,SCORE).Assessment:Assessment) / A.segp() function gets assessment for each score according to the sequence number of the range where the score falls in
4 =A3.groups(Assessment;count(1):Count) / Group A3’s records by assessment and count students in each group

A4’s result:

Assessment Count
excellent 6
Fail 4
Pass 18

4. Get the top/bottom N values

【Example 7】 Based on the score table, for each subject in every class, find the top 2 students’ IDs in terms of score. Below is part of the source data:

CLASS STUDENTID SUBJECT SCORE
Class one 1 English 84
Class one 1 Math 77
Class one 1 PE 69
Class one 2 English 81
Class one 2 Math 80

【SPL script】

A B
1 =connect("db") / Connect to the database
2 =A1.query("select * from Scores") / Query Scores table
3 =A2.group(CLASS,SUBJECT;~.top(-2;SCORE):TOP2) / A.top() function gets the records of top 2 students for each subject in each class; -2 means getting two records in descending order
4 =A3.conj(TOP2) / Concatenate all top 2 records together

A4’s result:

CLASS STUDENTID SUBJECT SCORE
Class one 4 English 96
Class one 9 English 93
Class one 13 Math 97
Class one 10 Math 97

5. Query records by primary key values

It’s common to locate records according to primary key values, such as finding employee records by employee IDs, querying detailed orders data by orders IDs, etc.

【Example 8】 Based on the associated Course table and SelectCourse table, list a table of course selection information where each course occupies a column. Below is part of the Course table:

ID NAME TEACHERID
1 Environmental protection and … 5
2 Mental health of College Students 1
3 Computer language Matlab 8

Here is part of the SelectCourse table:

ID STUDENT_NAME COURSE
1 Rebecca 2,7
2 Ashley 1,8
3 Rachel 2,7,10

The expected result:

ID STUDENT_NAME COURSE1 COURSE2 COURSE3
1 Rebecca Mental health of College Students Into Shakespeare
2 Ashley Environmental protection and … Modern economics
3 Rachel Mental health of College Students Into Shakespeare Music appreciation

【SPL script】

A B
1 =connect("db") /Connect to the database
2 =A1.query("select * from Course").keys(ID) / Query Course table and set ID as the primary key
3 =A1.query("select * from SelectCourse") / Query SelectCourse table
4 =A3.run(COURSE=COURSE.split@cp()) / Split each Course value in SelectCourse table by comma and reassign it to the field
5 =A4.max(COURSE.len()) / Get the largest number of selected courses
6 =create(ID,STUDENT_NAME, ${A5.("COURSE"+string(~)).concat@c()}) / Create an empty table where the number of course columns is the number A5 gets
7 >A4.run(A6.record([ID,STUDENT_NAME] COURSE.(A2.find(~).Name)))

A6’s result:

ID STUDENT_NAME COURSE1 COURSE2 COURSE3
1 Rebecca Mental health of College Students Into Shakespeare
2 Ashley Environmental protection and … Modern economics
3 Rachel Mental health of College Students Into Shakespeare Music appreciation

Find more examples in SPL CookBook.

Clone this wiki locally