Join clauses tutorial

You can use join clauses between you different data sources.

Here is a concrete example. Suppose you have one file containing some points definition, like this:

$ cat data.csv
#id,lat,lng
p0,39.1,2.5
p1,41.1,2.6
p2,42.1,2.7
p3,43.1,2.8

The first thing we want to do is make data.csv a permanent data source of GeoBase. Easy! Just use theadmin mode as described in this page: Admin mode tutorial.

Once you are done, you should be able to do this, assuming you named the source mine:

$ GeoBase --base mine

Now let's suppose we have another file defining lines referring to the first file.

$ cat lines.csv
#id1,id2
p0,p1
p1,p2
p2,p3

We want to tell GeoBase that it should look for the keys p0, p1 in the file data.csv, now a data source named mine. This is when you define a join clause. We are going to review the different ways to do it.

On the fly with the command line interface

If you are not all familiar with the definition of metadata with -i, you may want to read Using your own data with GeoBases first.

The -i option lets you define the meaning of each field with the command line. It has three arguments (in fact more, but here it does not matter):

$ cat lines.csv | GeoBase -i delimiter headers indices

With:

delimiter for, well, the column delimiter
headers for column names separated by a '/'
indices for columns used for key generation, also separated by a '/'

In our case we can do:

$ cat lines.csv | GeoBase -i ',' origin/destination

This will name the columns origin and destination, but will not tell GeoBase that these fields can be joined with another source. Here is the real deal:

$ cat lines.csv | GeoBase -i ',' origin{mine:id}/destination{mine:id} --map

The syntax is field{external source:external field} for each column. The external source here is mine, which we just defined, and here I suppose that mine has a field id (it depends on what you gave with --admin as column names).

Now you get a consistent map display. Since your element of the file lines.csv has no proper geocode, GeoBase will look in the join base for geocode and display lines between the geocodes found, thus giving the lines on the map.

lines

Persistent with the command line interface

We can also use the admin mode to permanently add the lines.csv file to the data sources, add specify the join clause when doing so.

[ 0 ] Command: add
[ 1 ] Source name: lines
[2/8] Path: ./lines.csv
[   ] Is the path local to the sources directory [yN]? 
[   ] Use lines.csv as primary source from /home/alex/.GeoBases.d [yN]? Y

>>>>> first line >>>>>
#id1,id2
<<<<<<<<<<<<<<<<<<<<<<
[   ] Add another path [yN]? 
[3/8] Delimiter: ,
[4/8] Headers: origin/destination
[5/8] Key field(s): destination
[   ] Add new index [yN]? 
[   ] Add new join [yN]? y
[7/8] Join clause: origin{mine:id}
[   ] Add new join [yN]? y
[7/8] Join clause: destination{mine:id}

--- [before]
lines: {}

+++ [after]
lines:
    delimiter: ','
    headers:
    - origin
    - destination
    indices: []
    join:
    -   fields: origin
        with:
        - mine
        - id
    -   fields: destination
        with:
        - mine
        - id
    key_fields: destination
    paths:
    -   file: /home/alex/.GeoBases.d/lines/lines.csv
        local: false

[8/8] Confirm [Yn]? 
===== Changes saved to /home/alex/.GeoBases.d/Sources.yaml

Now GeoBase -b lines is working, and as usual we can do what we want with this source.

Note that you can query the external base and display its content with --show field:external field:

$ GeoBase -b lines --show origin:lat origin:lng --quiet
#origin:lat^origin:lng
41.1^2.6
42.1^2.7
39.1^2.5

And of course you can all kinks of visualization:

$ GeoBase -b lines --graph -G origin destination

From the Python API

If you used the permanent method above, you should be able to do things like:

>>> from GeoBases import GeoBase
>>> mine = GeoBase('mine', verbose=False)
>>> mine.getLocation('p1')
(41.0, 2.6)

Then for the lines sources:

>>> from GeoBases import GeoBase
>>> lines = GeoBase('lines', verbose=False)
>>> lines.visualize()

If you did not set lines as a permanent data source, you can define the join clause during init (use data feed for on-the-fly definition):

from GeoBases import GeoBase
join = [{
    'fields' : 'origin',
    'with' : ['mine', 'id']
}, {
    'fields' : 'destination',
    'with' : ['mine', 'id']
}]
lines = GeoBase('feed',
                source=open('lines.csv'),
                delimiter=',',
                headers=['origin', 'destination'],
                join=join)
lines.visualize()

A bit further

That is it! You are now a GeoBase join clauses expert.

It is possible to define join clauses on multiple fields for one clause:

give a list of fields with the Python API
give fields separated with "/" on the command line (in admin mode, or using an extra argument of -i, check --help)
check examples from the main configuration file here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join clauses tutorial

On the fly with the command line interface

Persistent with the command line interface

From the Python API

A bit further

Clone this wiki locally