-
Notifications
You must be signed in to change notification settings - Fork 41
Join clauses tutorial
You can use join clauses between you different data sources.
Here is a concrete example. Suppose you have one file containing some points definition, like this:
$ cat data.csv
#id,lat,lng
p0,39.1,2.5
p1,41.1,2.6
p2,42.1,2.7
p3,43.1,2.8
The first thing we want to do is make data.csv
a permanent data source of GeoBase. Easy! Just use theadmin
mode as described in this page: Admin mode tutorial.
Once you are done, you should be able to do this, assuming you named the source mine:
$ GeoBase --base mine
Now let's suppose we have another file defining lines referring to the first file.
$ cat lines.csv
#id1,id2
p0,p1
p1,p2
p2,p3
We want to tell GeoBase that it should look for the keys p0, p1 in the file data.csv, now a data source named mine. This is when you define a join clause. We are going to review the different ways to do it.
If you are not all familiar with the definition of metadata with -i
, you may want to read Using your own data with GeoBases first.
The -i
option lets you define the meaning of each field with the command line.
It has three arguments (in fact more, but here it does not matter):
$ cat lines.csv | GeoBase -i delimiter headers indices
With:
- delimiter for, well, the column delimiter
- headers for column names separated by a '/'
- indices for columns used for key generation, also separated by a '/'
In our case we can do:
$ cat lines.csv | GeoBase -i ',' origin/destination
This will name the columns origin and destination, but will not tell GeoBase that these fields can be joined with another source. Here is the real deal:
$ cat lines.csv | GeoBase -i ',' origin{mine:id}/destination{mine:id} --map
The syntax is field{external source:external field} for each column. The external source here is mine, which we just defined, and here I suppose that mine has a field id (it depends on what you gave with --admin
as column names).
Now you get a consistent map display. Since your element of the file lines.csv
has no proper geocode, GeoBase will look in the join base for geocode and display lines between the geocodes found, thus giving the lines on the map.
We can also use the admin mode to permanently add the lines.csv
file to the data sources, add specify the join clause when doing so.
[ 0 ] Command: add
[ 1 ] Source name: lines
[2/8] Path: ./lines.csv
[ ] Is the path local to the sources directory [yN]?
[ ] Use lines.csv as primary source from /home/alex/.GeoBases.d [yN]? Y
>>>>> first line >>>>>
#id1,id2
<<<<<<<<<<<<<<<<<<<<<<
[ ] Add another path [yN]?
[3/8] Delimiter: ,
[4/8] Headers: origin/destination
[5/8] Key field(s): destination
[ ] Add new index [yN]?
[ ] Add new join [yN]? y
[7/8] Join clause: origin{mine:id}
[ ] Add new join [yN]? y
[7/8] Join clause: destination{mine:id}
--- [before]
lines: {}
+++ [after]
lines:
delimiter: ','
headers:
- origin
- destination
indices: []
join:
- fields: origin
with:
- mine
- id
- fields: destination
with:
- mine
- id
key_fields: destination
paths:
- file: /home/alex/.GeoBases.d/lines/lines.csv
local: false
[8/8] Confirm [Yn]?
===== Changes saved to /home/alex/.GeoBases.d/Sources.yaml
Now GeoBase -b lines
is working, and as usual we can do what we want with this source.
Note that you can query the external base and display its content with --show field:external field
:
$ GeoBase -b lines --show origin:lat origin:lng --quiet
#origin:lat^origin:lng
41.1^2.6
42.1^2.7
39.1^2.5
And of course you can all kinks of visualization:
$ GeoBase -b lines --graph -G origin destination
If you used the permanent method above, you should be able to do things like:
>>> from GeoBases import GeoBase
>>> mine = GeoBase('mine', verbose=False)
>>> mine.getLocation('p1')
(41.0, 2.6)
Then for the lines sources:
>>> from GeoBases import GeoBase
>>> lines = GeoBase('lines', verbose=False)
>>> lines.visualize()
If you did not set lines as a permanent data source, you can define the join clause during init (use data feed for on-the-fly definition):
from GeoBases import GeoBase
join = [{
'fields' : 'origin',
'with' : ['mine', 'id']
}, {
'fields' : 'destination',
'with' : ['mine', 'id']
}]
lines = GeoBase('feed',
source=open('lines.csv'),
delimiter=',',
headers=['origin', 'destination'],
join=join)
lines.visualize()
That is it! You are now a GeoBase join clauses expert.
It is possible to define join clauses on multiple fields for one clause:
- give a list of fields with the Python API
- give fields separated with "/" on the command line (in admin mode, or using an extra argument of
-i
, check--help
) - check examples from the main configuration file here