The Select Operation on Structured Data

A select operation gets one or more members from a set according to a specific condition. It is similar to a locate operation. But the latter is concerned with the positions of certain members while the select operation wants the detailed information of a member record. Getting the incomes of the employees whose hire durations are over 10 years and finding names of customers whose order amounts are above 10,000 are among the many examples of select operations. But to perform selections conveniently and efficiently, you’d better read this article carefully where various scenarios are listed and sample scripts in esProc SPL are provided.

1. Get members meeting the specific condition

【Example 1】 List European and African cities with a population of more than 2 million and their populations in column groups (each column group is ordered by population in descending order). Below is part of the world’s urban population table:

Continent	Country	City	Population
Africa	Egypt	Cairo	6789479
Asia	China	Shanghai	24240000
Europe	Britain	London	7285000
…	…	…	…

The expected result:

Europe City	Population	Africa City	Population
Moscow	8389200	Cairo	6789479
London	7285000	Kinshasa	5064000
St Petersburg	4694000	Alexandria	3328196
…	…	…	…

【SPL script】

	A	B
1	=connect("db").query("select * from World where Continent in('Europe','Africa') and Population >= 2000000")	/ Connect to database and get records of European and African cities having a population of over 2 million
2	=A1.select(Continent:"Europe")	/ A.select()function gets records of Europe
3	=A1.select(Continent:"Africa")	/A.select()function gets records of Africa
4	=create('Europe City',Population,'Africa City', Population)	/ Create an empty table according to the target structure
5	=A4.paste(A2.(City),A2.(Population),A3.(City),A3.(Population))	/ A.paste()function pastes values to corresponding columns

A4’s result:

Europe City	Population	Africa City	Population
Moscow	8389200	Cairo	6789479
London	7285000	Kinshasa	5064000
St Petersburg	4694000	Alexandria	3328196
…	…	…	…

【Example 2】 Get full names and salaries of employees in R&D department in the New York state. Below is part of the source data:

ID	NAME	SURNAME	STATE	DEPT	SALARY
1	Rebecca	Moore	California	R&D	7000
2	Ashley	Wilson	New York	Finance	11000
3	Rachel	Johnson	New Mexico	Sales	9000
4	Emily	Smith	Texas	HR	7000
5	Ashley	Smith	Texas	R&D	16000
…	…	…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/ Connect to the data source
2	=A1.query("select * from Employee")	/ Import Employee table
3	=A2.select(STATE=="New York"&&DEPT=="R&D")	/ A.select() function gets records of employees of R&D department in New York; the double ampersand sign ("&&") is used to connect two conditions that need to be met at the same time
4	=A3.new(NAME+""+SURNAME:FULLNAME, SALARY)	/ Use the plus sign (+) to connect the name string and surname string into a full name

A4’s result:

FULLNAME	SALARY
Matthew Johnson	6000
Lauren Thomas	12000
Brooke Williams	12000

【Example 3】 Find the number of employees who are below 30 or over 50 in each department. Below is part of the source data:

ID	NAME	BIRTHDAY	STATE	DEPT	SALARY
1	Rebecca	1974/11/20	California	R&D	7000
2	Ashley	1980/07/19	New York	Finance	11000
3	Rachel	1970/12/17	New Mexico	Sales	9000
4	Emily	1985/03/07	Texas	HR	7000
5	Ashley	1975/05/13	Texas	R&D	16000
…	…	…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/ Connect to the data source
2	=A1.query("select * from Employee")	/ Import Employee table
3	=A2.select((age=age(BIRTHDAY), age<30
4	=A3.groups(DEPT; count(~):Count)	/ Group the selected records and count the employees

A4’s result:

DEPT	Count
4	9
5	5
6	3
7	6
8	1

2. Get the record(s) holding the maximum/minimum value

【Example 4】 Based on the scores table, get the student ID with the lowest math score. Below is part of the source table:

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/ Connect to the database
2	=A1.query("select * from Scores where SUBJECT='Math'and CLASS='Class one'")	/ Get math records for class one
3	=A2.minp(SCORE)	/A.minp() function gets the record having the lowest score
4	=A3.STUDENTID	/ Get the student ID from A3’s record

There could be more than one record that contains the minimum value. To return all eligible records, you can use @a option in A.minp() function:

	A	B
3	=A2.minp@a(SCORE)	/ A.minp() function works with @a option to get all records containing the lowest score
4	=A3.(STUDENTID)	/ Get student IDs from A3’s records

A3’s result:

CLASS	STUDENTID	SUBJECT	SCORE
Class one	5	Math	60
Class one	14	Math	60

A4’s result:

Member
5
14

【Example 5】 Based on the Olympics medal table, get all information of the country that holds the top place for the longest time in terms of total medals. Below is part of the source table:

Game	Nation	Gold	Silver	Copper
30	USA	46	29	29
30	China	38	27	23
30	UK	29	17	19
30	Russia	24	26	32
30	Korea	13	8	7
…	…	…	…	…

【SPL script】

	A	B
1	=file("Olympic.csv").import@cqt()	/ Import the Olympic medal tally file
2	=A1.sort@z(Game, 1000000Gold+1000Silver+Copper)	/ Sort records by games and the totals
3	=A2.group@o1(Game)	/ Get the one record for each game, which is the first record because records are already sorted in descending order
4	=A3.group@o(Nation)	/ Group the selected records by nations in their original order
5	=A4.maxp(~.len())	/ A.maxp() selects the group having the most records, which are the country that holds the top place in a row for the longest time

A5’s result:

Game	Nation	Gold	Silver	Copper
10	USA	41	32	30
9	USA	22	18	16
8	USA	45	27	27
7	USA	41	27	28

3. Get data by ranges

Sometimes we need to first find the sequence number of the range where a value belongs to and then get its corresponding members in a set. Examples include returning the assessment (Excellent, Good, Average, Failed) according to a student’s score, and finding a household’s income range (Below or near poverty level, Low income, middle class, Upper middle class) according to its annually income.

【Example 6】 Based on the score table, find the number of students in Excellent, Pass, and Failed ranges respectively for the English subject. Below is part of the source data:

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL script】

	A	B
1	=connect("db").query("select * from Scores where SUBJECT='English'")	/ Connect to the database and get records of English subject
2	=create(Assessment,Score).record(["fail",0,"pass",60,"excellent",90])	/ Create a lookup table of score ranges and assessments
3	=A1.derive(A2.segp(Score,SCORE).Assessment:Assessment)	/ A.segp() function gets assessment for each score according to the sequence number of the range where the score falls in
4	=A3.groups(Assessment;count(1):Count)	/ Group A3’s records by assessment and count students in each group

A4’s result:

Assessment	Count
excellent	6
Fail	4
Pass	18

4. Get the top/bottom N values

【Example 7】 Based on the score table, for each subject in every class, find the top 2 students’ IDs in terms of score. Below is part of the source data:

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/ Connect to the database
2	=A1.query("select * from Scores")	/ Query Scores table
3	=A2.group(CLASS,SUBJECT;~.top(-2;SCORE):TOP2)	/ A.top() function gets the records of top 2 students for each subject in each class; -2 means getting two records in descending order
4	=A3.conj(TOP2)	/ Concatenate all top 2 records together

A4’s result:

CLASS	STUDENTID	SUBJECT	SCORE
Class one	4	English	96
Class one	9	English	93
Class one	13	Math	97
Class one	10	Math	97
…	…	…	…

5. Query records by primary key values

It’s common to locate records according to primary key values, such as finding employee records by employee IDs, querying detailed orders data by orders IDs, etc.

【Example 8】 Based on the associated Course table and SelectCourse table, list a table of course selection information where each course occupies a column. Below is part of the Course table:

ID	NAME	TEACHERID
1	Environmental protection and …	5
2	Mental health of College Students	1
3	Computer language Matlab	8
…	…	…

Here is part of the SelectCourse table:

ID	STUDENT_NAME	COURSE
1	Rebecca	2,7
2	Ashley	1,8
3	Rachel	2,7,10
…	…	…

The expected result:

ID	STUDENT_NAME	COURSE1	COURSE2	COURSE3	…
1	Rebecca	Mental health of College Students	Into Shakespeare		…
2	Ashley	Environmental protection and …	Modern economics		…
3	Rachel	Mental health of College Students	Into Shakespeare	Music appreciation	…
…	…	…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/Connect to the database
2	=A1.query("select * from Course").keys(ID)	/ Query Course table and set ID as the primary key
3	=A1.query("select * from SelectCourse")	/ Query SelectCourse table
4	=A3.run(COURSE=COURSE.split@cp())	/ Split each Course value in SelectCourse table by comma and reassign it to the field
5	=A4.max(COURSE.len())	/ Get the largest number of selected courses
6	=create(ID,STUDENT_NAME, ${A5.("COURSE"+string(~)).concat@c()})	/ Create an empty table where the number of course columns is the number A5 gets
7	>A4.run(A6.record([ID,STUDENT_NAME]	COURSE.(A2.find(~).Name)))

A6’s result:

ID	STUDENT_NAME	COURSE1	COURSE2	COURSE3
1	Rebecca	Mental health of College Students	Into Shakespeare
2	Ashley	Environmental protection and …	Modern economics
3	Rachel	Mental health of College Students	Into Shakespeare	Music appreciation
…	…	…	…	…

Find more examples in SPL CookBook.

SPL Resource： SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Select Operation on Structured Data

1. Get members meeting the specific condition

2. Get the record(s) holding the maximum/minimum value

3. Get data by ranges

4. Get the top/bottom N values

5. Query records by primary key values

Clone this wiki locally