-
Notifications
You must be signed in to change notification settings - Fork 41
Using GeoBase as a grep on steroids
GeoBase command line tool lets you load data from stdin, and perform queries on it. There queries may be, for example:
- a match on a particular field
- a fuzzy match on a particular field
- a phonetic match on a particular field
Let's take a simple example:
$ cat data.csv
A,Paris,France
B,Lyon,France
C,London,England
D,Madrid,Spain
We can pipe that into GeoBase command line tool, define the delimiter and name the fields. The default output is a fancy terminal display.
$ cat data.csv | GeoBase -i ',' id/name/country
Keeping 4 result(s) from 4 initially...
id A D B C
name Paris Madrid Lyon London
country France Spain France England
Now that we can perform queries adding the --exact
, --fuzzy
, or --phonetic
options.
Exact searches are just normal matches :).
Example of exact search on field country (configured with -E
), for value Spain:
$ cat data.csv | GeoBase -i ',' id/name/country --exact Spain -E country
(*) Applying: field country == "Spain"
Keeping 1 result(s) from 1 initially...
id D
name Madrid
country Spain
Fuzzy searches are searches based on string distance (using a modified Levenshtein distance).
Example of fuzzy search on field name (default if not configured with -F
), for value Lfndo:
$ cat data.csv | GeoBase -i ',' id/name/country --fuzzy Lfndo
(*) Applying: field name ~= "Lfndo" (70.0%)
Keeping 1 result(s) from 1 initially...
__ref__ 72.7 %
id C
name London
country England
Phonetic searches are searches based on the sound of things when pronounced.
Example of phonetic search on field name (default if not configured with -P
), for value periss:
$ cat data.csv | GeoBase -i ',' id/name/country --phonetic periss
(*) Applying: field name sounds ~ "periss" with dmetaphone
Keeping 1 result(s) from 1 initially...
__ref__ PRS/None
id A
name Paris
country France
Of course you can dump the results differently with other displays, like csv display using --quiet
.
$ cat data.csv | GeoBase -i ',' id/name/country --phonetic periss --quiet
#__ref__^__key__^id^name^country
PRS/None^Paris^A^Paris^France
A few things you can control:
- the displayed fields with
--show
parameters, for example--show id name
. - the
--quiet
display with-Q
, for example to change the delimiter and remove the header add-Q ',' N
For advanced usage refer to --help
.