Skip to content

Latest commit

 

History

History
79 lines (58 loc) · 3.54 KB

README.md

File metadata and controls

79 lines (58 loc) · 3.54 KB

escp - An ElasticSearch Copy Utility

escp is a python script to aid in re-indexing or copying indices between or within ES clusters. It uses the official ElasticSearch Python API to expose a user-friendly interface and hence requires the 'elasticsearch' python package to be installed (via pip or otherwise).

usage: escp [-h] [-i] [-s] [-c] [-m] [-w WORKERS] [-r SRCREGEX]
            [-C CHUNK_SIZE] [-S SHARDS] [-R REPLICAS]
            sourceIndex destIndex

Copy indices within or between ElasticSearch clusters

positional arguments:
  sourceIndex           The source cluster and index to transfer, e.g.,
                        http://localhost:9200/my.index
  destIndex             The destination cluster and index to transfer to,
                        e.g., http://localhost:9200/my.destindex

optional arguments:
  -h, --help            show this help message and exit
  -i, --ignore-health   Ignore the health status of the input and output
                        servers
  -s, --strict          Destination Index must *not* exist and must be created
  -c, --create-only     Only create new entries, if an entry with a given id
                        exists already it will not be overwritten/updated
  -m, --no-mapping      Do not copy mapping over from source
  -w WORKERS, --workers WORKERS
                        Number of bulk workers to run
  -r SRCREGEX, --source-regex SRCREGEX
                        Use formating for destination best on source index
                        regex
  -C CHUNK_SIZE, --chunk-size CHUNK_SIZE
                        Number of docs to chunk up and send to destintation
  -S SHARDS, --shards SHARDS
                        Number of shards for output index
  -R REPLICAS, --replicas REPLICAS
                        Number of replicas for output index

Examples:

To copy an index within the same cluster on localhost, ensuring the output index doesn't already exist, it would look like:

escp -s localhost:9200/input_index localhost:9200/output_index

Copy an index within the same cluster, ensuring entries are only created (default behavior is to replace/update if exists)

escp -c localhost:9200/input_index localhost:9200/output_index

Copy multiple indices into a single index

escp localhost:9200/input_indices-* localhost:9200/output_index

Copy multiple indices from one cluster to another

escp localhost:9200/my.indices localhost:9201/

Copy multiple indices into multiple indices based on the source index name. Using the '*' character you can copy the source index name into the destination. In the below example all destination indices will have the same name as the source index but preceeded with 'copy-*'

escp localhost:9200/input_indices-* localhost:9200/copy-*

For more advanced usage you can use the -r or --source-regex flag to parse the source index name and use the parsed components as part of the destination name. For example if you have a set of time series indices you can change the prefix like so

escp -r "logstash\-(\d+)\.(\d+)\.(\d+)" localhost:9200/logstash-2016.01.* localhost:9200/output-?{1}.?{2}.?{3}"

This would take all the indices that match the source query and reuse the date fields in the destination. Note that the regex is python regex syntax and uses the groups that match (the things in parens) to replace the fields in the destination. The numbers begin with 1 and go up to the number of matches you expect in your regex. You can use the matches using the above syntax which looks like ?{<number>} where number corresponds the regexes matching group.