-
Notifications
You must be signed in to change notification settings - Fork 5
Solr Recipes and Tricks
Sometimes you may want to make Solr do things outside of procedures described in the Local Development Quickstart. Here's a list of some of those things.
Most of these utilize jq
, the JSON processing command line utility. Install it as appropriate for your OS, e.g. brew install jq
or apt-get install jq
or yum install jq
.
Connect via SSH to a Geoportal server (or use your local desktop) and execute curl
to dump all documents to $HOME/server-output.json
on the server. Solr treats this as a query and wraps the search results in other metadata. We need .response.docs
. Change rows=
if you need more than 100,000.
Note: This dumps all fields in Solr including some calculated fields, and cannot be used to directly import back into a different Solr index.
# From a geoportal server...
$ curl 'http://localhost:8983/solr/production/search?q=*%3A*&start=0&rows=100000&wt=json' \
| jq '.response.docs' \
# save to $HOME/server-output.json
> ~/server-output.json
Same as previous, but run it from your workstation over SSH in one shot, saving the output file to $HOME/local-output.json
on your workstation.
# From your workstation...
$ ssh geoportal-server-address.example.edu \
curl 'http://localhost:8983/solr/production/search?q=*%3A*&start=0&rows=100000&wt=json' \
# Extract only the search result
| jq '.response.docs' \
# save to $HOME/local-output.json
> ~/local-output.json
To import documents into another Solr index, the dump requires some post-processing to remove dynamic (calculated by the index) fields. Fields to remove include:
_version_
timestamp
score
solr_bboxtype
solr_bboxtype__minX
solr_bboxtype__minY
solr_bboxtype__maxX
solr_bboxtype__maxY
The following dumps documents via SSH to local-output.json
on your workstation, ready for import into another Solr index.
# From your workstation...
$ ssh geoportal-server-address.example.edu \
curl 'http://localhost:8983/solr/production/search?q=*%3A*\&start=0\&rows=100000\&wt=json' \
| jq 'del(.response.docs[]["_version_", "score", "timestamp", "solr_bboxtype", "solr_bboxtype__minX", "solr_bboxtype__minY", "solr_bboxtype__maxX", "solr_bboxtype__maxY"])|.response.docs' \
# save to $HOME/local-output.json
> ~/local-output.json
Solr provides bin/post
as a command-line handler to ingest documents. Using JSON produced with the previous method, import the file to a Solr index:
# On your workstation or another Solr host
# "-c development" would load docs to a core named "development"
$ /path/to/solr/bin/post -c corename /path/to/local-output.json
Combine all of the above to dump documents out of Solr on a remote server and pipe that output DIRECTLY into Solr on your workstation.
# Run from your workstation
$ ssh geoportal-server-address.example.edu \
curl 'http://localhost:8983/solr/production/search?q=*%3A*\&start=0\&rows=100000\&wt=json' \
| jq 'del(.response.docs[]["_version_", "score", "timestamp", "solr_bboxtype", "solr_bboxtype__minX", "solr_bboxtype__minY", "solr_bboxtype__maxX", "solr_bboxtype__maxY"])|.response.docs' \
# Pipe that right into Solr on your workstation! ("-" at the end forces it to use stdin)
| /path/to/your/workstation/solr/bin/post -c corename -type application/json -
The Geoportal has a nightly Rake task rake geoportal:export_data
which dumps all records to public/data.json
in the application root, including suppressed records. Its format is similar to the first example above and requires jq
post-processing to remove dynamic fields before it can be used for import.
The dump task runs only once a day, so the records may not be as fresh as querying them directly from Solr.
# From your workstation...
$ curl https://geo.btaa.org/data.json \
| jq 'del(.response.docs[]["_version_", "score", "timestamp", "solr_bboxtype", "solr_bboxtype__minX", "solr_bboxtype__minY", "solr_bboxtype__maxX", "solr_bboxtype__maxY"])|.response.docs' \
# save to $HOME/local-output.json
> ~/local-output.json