Skip to content
This repository has been archived by the owner on Jun 14, 2021. It is now read-only.

Using GeoBase as a grep on steroids

alexprengere edited this page Mar 20, 2013 · 7 revisions

GeoBase command line tool lets you load data from stdin, and perform queries on it. There queries may be, for example:

  • a match on a particular field
  • a fuzzy match on a particular field
  • a phonetic match on a particular field

Let's take a simple example:

$ cat data.csv
A,Paris,France
B,Lyon,France
C,London,England
D,Madrid,Spain

We can pipe that into GeoBase command line tool, define the delimiter and name the fields. The default output is a fancy terminal display.

$ cat data.csv | GeoBase -i ',' id/name/country
Keeping 4 result(s) from 4 initially...

id                   A                    D                    B                    C                    
name                 Paris                Madrid               Lyon                 London               
country              France               Spain                France               England              

Now that we can perform queries adding the --exact, --fuzzy, or --phonetic options.

Exact search

Exact searches are just normal matches :).

Example of exact search on field country (configured with -E), for value Spain:

$ cat data.csv | GeoBase -i ',' id/name/country --exact Spain -E country
(*) Applying: field country == "Spain"
Keeping 1 result(s) from 1 initially...

id                                      D                                       
name                                    Madrid                                  
country                                 Spain                                   

Fuzzy search

Fuzzy searches are searches based on string distance (using a modified Levenshtein distance).

Example of fuzzy search on field name (default if not configured with -F), for value Lfndo:

$ cat data.csv | GeoBase -i ',' id/name/country --fuzzy Lfndo
(*) Applying: field name ~= "Lfndo" (70.0%)
Keeping 1 result(s) from 1 initially...

__ref__                                 72.7 %                                  
id                                      C                                       
name                                    London                                  
country                                 England                                 

Phonetic search

Phonetic searches are searches based on the sound of things when pronounced.

Example of phonetic search on field name (default if not configured with -P), for value periss:

$ cat data.csv | GeoBase -i ',' id/name/country --phonetic periss 
(*) Applying: field name sounds ~ "periss" with dmetaphone
Keeping 1 result(s) from 1 initially...

__ref__                                 PRS/None                                
id                                      A                                       
name                                    Paris                                   
country                                 France                                  

Tweaking output

Of course you can dump the results differently with other displays, like csv display using --quiet.

$ cat data.csv | GeoBase -i ',' id/name/country --phonetic periss --quiet
#__ref__^__key__^id^name^country
PRS/None^Paris^A^Paris^France

A few things you can control:

  • the displayed fields with --show parameters, for example --show id name.
  • the --quiet display with -Q, for example to change the delimiter and remove the header add -Q ',' N

For advanced usage refer to --help.